Claude for QA engineers: 10 real scenarios and prompts

AI assistants have moved from “novelty” to legitimate working tool. For QA engineers this is especially powerful: routine tasks like generating test cases, parsing logs, writing regular expressions and preparing test data can be delegated to the model so you can focus on the substantive part of the job.

Below are 10 concrete scenarios where Claude saves real hours, with example prompts and realistic expectations. Applies to any LLM (ChatGPT, Gemini), but Claude tends to win on long-context tasks and precision.

1. Generating test cases from requirements

The most direct use case. Input: requirement text, a user story, or a Jira description. Output: a structured set of positive, negative and edge-case tests.

I’m a QA engineer. Read the requirement below and generate a checklist of test cases: positive, negative, boundary, security. For each case — name, steps, expected result. Format: markdown table.

Quality depends heavily on requirement completeness. Trivial “search must work” gets template cases. A detailed “search by first and last name, case-insensitive, supports 2-30 chars, exact and partial match” — gets 40+ concrete cases with boundary values.

Tip: always give the model “context” — what the product is, the platform, the constraints. Without that, it generates generic cases that don’t account for, say, your mobile game having no keyboard.

2. Parsing bugs and stack traces

You see an exception in the log — Claude helps quickly understand what actually happened and how it could have led to that point. Especially useful with unfamiliar tech (Unity, Cocoa, Android internals).

Explain what this stacktrace means and what the possible causes are. I’m a QA, comfortable with high-level logic but not deeply with Unity internals.

Then you can iterate: “If the input index was 0, could it have reached this branch?”, “What tests would have caught this earlier?” — the model reasons surprisingly well in dialogue.

3. Generating test data

Especially strong on boundary values, valid/invalid strings, locale-specific edge cases.

Generate 30 strings to test an “email” field: valid (including edge cases like plus-addressing, IDN, long ones), invalid with various error types, potentially dangerous (SQL injection, XSS, path traversal). Format: CSV with columns email,expected_result,why.

Don’t forget to verify generation manually. The model occasionally misses rare but valid cases (e.g. email with a quoted local-part).

4. Localization: pluralization and edge cases

If your product has 5+ languages — Claude is great at finding problematic cases. “Show me all plural forms of N for Russian / Polish / Arabic” → get ready-made tables with CLDR rules.

I’m testing localization for RU/EN/DE/AR. For each language show: 1) how many plural forms and which; 2) which numbers trigger rare forms; 3) typical localization bugs in these languages that are easy to miss. Give a counter-example for each.

Also: “Generate 10 strings in German of varying length — short, medium, long compounds — for testing UI overflow.” You get Bestätigung, Geschwindigkeitsbegrenzungseinrichtung, etc.

5. Turning a vague bug into a proper bug report

You have a sloppy “it doesn’t work, you tap and nothing happens” from a PM. Claude helps structure it:

Rewrite this raw bug report into a proper one with Steps to Reproduce, Expected, Actual, Environment, Severity sections. Don’t invent details not present in the source — flag what needs clarification.

Real time saver, especially if you write 5-10 tickets a day.

6. Help with regex, JQL, XPath

This is an area where LLMs work almost 10/10. “Write a regex for RFC 5322 email validation” / “JQL for all bugs in SH for the last 2 weeks, unresolved, no assignee” / “XPath to the 3rd button inside a div with class dialog__actions” — get it in seconds.

Worth checking on boundary cases. Especially email and URL regexes — two “cursed” examples where even LLMs sometimes miss.

7. Comparing API responses and logs between builds

Paste two JSON responses from the same endpoint (build 1.5 and 1.6) — ask the model to show the diff with interpretation.

These are JSON responses of /api/levels from builds 1.5 and 1.6. Show semantic differences (not formatting). What could break on the client if it didn’t expect these changes?

Especially useful for contract testing along the path “backend changed a field — client didn’t know”.

8. Analyzing UI screenshots for test cases

Modern Claude models work with images. Drop a screenshot of an app screen — the model describes what’s on it, generates a list of test cases.

I’m a QA looking at this mobile game screen. Generate a list of test cases: visual checks, functional checks, different screen sizes, different locales, different permission states.

Especially useful for regression checklists for screens delivered by a designer. Design → 30 cases in a minute.

9. Writing automated tests

Selenium, Playwright, Appium, pytest — the model writes tests from a scenario description. Not “the whole thing turnkey”, but a skeleton in minutes.

Write a Playwright test in TypeScript for the following scenario: open login, enter email and password, tap Sign In, verify redirect to /dashboard, verify welcome banner. Use web-first assertions, no waitForTimeout.

If you already have a Page Object pattern — you can feed the model one example and ask it to generate new tests in the same style. This works surprisingly well.

10. Explaining unfamiliar technologies

QAs often work at the intersection of technologies — iOS today, Kafka tomorrow, gRPC the day after. Claude explains concepts plainly, without fluff.

Explain to me, a QA engineer, what gRPC is and how it differs from REST. What’s special about testing it: what I should check, what tools to use, what pitfalls to watch for.

Doesn’t replace deep study — but gives a fast bootstrap, after which you can google details concretely.

What Claude won’t replace

Eyes. UI bugs, animation edge-cases, broken icons — that’s the human at the device.
Product knowledge. The model doesn’t know that “level 47” in your game is a special case with different content. Context must be given by a human.
Runtime access. The model can’t launch your game, tap a button, check network traffic. You do that.
Creativity outside templates. Tricky exploration testing that requires “thinking like a breaker” — humans still win.

Pitfalls and risks

Hallucinating URLs and APIs. The model confidently invents an endpoint that doesn’t exist. Especially dangerous with specific vendors. Always verify via curl or Google.
Stale data. The model has a cutoff date. iOS 19 APIs might be unknown.
Confidential data. Don’t paste real user logs, PII, or tokens into a public LLM. Use Enterprise mode or local models for sensitive content.
Overreliance. If a QA stops thinking themselves and just “throws into Claude” — they lose intuition. Use as an accelerator, not a head replacement.

Prompt engineering for QA

A few patterns that improve answer quality:

Assign a role: “You’re a QA with 10 years of experience in mobile games” — the model changes tone and precision.
Show the format: “Reply as a markdown table with columns X, Y, Z” — much better than trying to parse prose.
Give an example: “Here’s one test in the required format — generate 10 more in the same style.”
Ask for alternatives: “Give 3 different approaches” — a separate “critical” view makes the answer more considered.
Iterate: don’t try to get the perfect answer on the first prompt. Refine in dialogue, add context.

Claude Code — a separate story

If tasks go beyond “writing text” and involve actual work with a repository, logs, files — there’s Claude Code. It’s a CLI/IDE tool that can read files in your project, run commands, open Jira tickets, hit APIs.

Real QA scenarios where it’s indispensable:

Analyzing logs in a file: “Grep this 50,000-line logcat for errors, group by type, give me the top 10 problem classes.”
Bulk creating test cases in TestRail: “Copy the structure from project A to project B via the API.”
Running a checklist across configs: “Verify that every Localizable.strings file contains the key error.network.”
Comparing versions: “What changed in remote_config_defaults.json between 1.5 and 1.6?”
Creating Jira tickets from test results — via the Atlassian API.

Where to start

Open claude.ai (or ChatGPT, Gemini — the patterns are the same). Take a fresh task with real requirements. Try generating a test-case checklist — assess coverage.
Keep a personal collection of prompts in Notion / Obsidian / a .md file. A good prompt is reused dozens of times — the savings grow.
After a week or two, pick one routine task you do weekly (regression checklist, bug intake, log reading) — formalize a prompt for it.
Don’t try to automate everything at once. One use case at a time, until fully integrated into workflow.