This commit is contained in:
2026-04-23 18:13:42 +03:00
commit afb826e85f
3 changed files with 789 additions and 0 deletions

287
AGENTS.md Normal file
View File

@@ -0,0 +1,287 @@
# AGENTS.md
## Purpose
This agent is used for **website QA/testing** of real pages in a browser environment.
The testing must be performed with **Playwright**.
Visual page inspection must be performed with the **playwright-screenshot-inspector** skill.
The agent must check the **real rendered page**, not just raw HTML.
---
## Required tools and skills
- **Playwright** — for browser automation, page opening, interaction, form submission, and response inspection.
- **playwright-screenshot-inspector** — for screenshot-based visual inspection of the rendered page.
---
## General testing rules
1. Always test the **real live page URL** provided by the user.
2. Always test both:
- **Desktop / PC viewport**
- **Mobile viewport**
3. Always capture screenshots for both desktop and mobile.
4. Compare the actual rendered page against the screen/reference provided by the user.
5. **All interactive UI elements must be tested manually via interaction.**
6. Report all findings clearly and classify issues by severity:
- **Critical**
- **Major**
- **Minor**
7. Do not stop after the first issue. Complete the full checklist.
8. At the end of testing, create a **summary HTML file** with all findings.
9. All results and summaries sould be in Russian language.
10. Split form submission for 2 parts
First is forms submition summary results and accessability in table form
Second is details in hidden layout
### Configuration-driven page selection
- Use `pages.json` in the workspace root as the default source of primary pages and visual references.
- If the user asks to **just check the site** without specifying a single page, run the full list from `pages.json`.
- If the user provides a **specific page URL**, first prepare to test that exact URL.
- In that case, ask one short question: whether to also test the primary pages from `pages.json`.
- If the user answers **yes**, test:
- the explicit URL from the user request
- the pages from `pages.json`
- Deduplicate identical URLs before running checks.
- If the user answers **no**, test only the explicit URL from the request.
- For visual comparison, use the reference data from `pages.json` when present:
- `desktop_screenshot`
- `mobile_screenshot`
- `figma_url`
- If a page in `pages.json` has no reference assets yet, still test the real page and clearly note that visual comparison is limited by missing reference materials.
---
## Required order of checks
### 1. Visual page and UI interaction check
The user will provide:
- a page URL
- a reference screen/mockup/screenshot
#### 1.1 Open and inspect the real page
- Open the provided page in Playwright.
- Wait until the page is fully loaded and visually stable.
- Test in:
- Desktop viewport
- Mobile viewport
#### 1.2 Capture screenshots
- Take full-page screenshots for both desktop and mobile.
- Use **playwright-screenshot-inspector** to inspect the screenshots.
#### 1.3 Compare with the provided reference
Check for differences between:
- the real page
- the provided screen/reference
Look for:
- incorrect layout structure
- broken blocks
- missing elements
- different text/content placement
- wrong fonts, sizes, spacing, alignment
- overlap or clipping
- responsive/mobile issues
- hidden or misplaced elements
---
### 1.4 UI interaction testing (MANDATORY)
If any interactive UI elements are present, they **must be tested**.
This includes (but is not limited to):
- buttons
- links (especially JS-triggered)
- popup / modal windows
- forms opened via buttons
- dropdowns
- accordions
- tabs
- sliders/carousels
- burger menus (mobile)
- tooltips
- filters/sort controls
For each UI element:
- trigger the interaction (click, hover, input, etc.)
- verify it works as expected
- verify correct content is shown
- verify no visual breakage occurs
- verify no JS errors appear
- verify behavior on both desktop and mobile
Check for:
- element not clickable
- no reaction on click
- wrong content shown
- broken animation or layout
- popup not opening/closing
- overlay issues (scroll lock, z-index bugs)
- elements hidden or cut off on mobile
Any broken interaction must be reported.
---
### 1.5 Produce visual + UI summary
Create an **HTML summary file** containing:
- page URL
- test date/time
- desktop screenshot
- mobile screenshot
- list of visual differences
- list of UI interaction issues
- severity for each issue
- short conclusion
---
### 2. Server response and robots/indexing checks
#### 2.1 Check HTTP response
Inspect:
- HTTP status code
- redirects
- final URL
- response headers
#### 2.2 Check for critical indexing blockers
These are **Critical** if found unexpectedly:
- `X-Robots-Tag: noindex`
- `X-Robots-Tag: nofollow`
- `meta name="robots" content="noindex"`
- `meta name="robots" content="nofollow"`
- restrictive `googlebot` meta
- incorrect canonical
- unexpected redirects
Any directive blocking indexing must be marked **Critical**.
#### 2.3 Check meta and SEO basics
Inspect:
- `<title>`
- meta description
- robots meta tag
- canonical
- viewport
- hreflang (if present)
---
### 3. Form testing
#### 3.1 Detect forms
Find all forms including:
- visible forms
- popup forms
- dynamically opened forms
#### 3.2 Test submission
For each form:
- fill required fields with valid data
- submit form
- observe frontend validation
- inspect server/network response
#### 3.3 Validate behavior
Check:
- submit works
- validation messages appear correctly
- incorrect input shows errors
- success message is shown
- no JS errors
- no duplicate requests
- backend response is correct
#### 3.4 Report results
For each form include:
- selector/name
- fields
- result (success / error / broken)
- server response
- severity
---
## Output format
### 1. Text summary
- overall result
- issue count by severity
- visual correctness
- UI interaction status
- indexing status
- form functionality status
### 2. HTML report
Must include:
- URL
- screenshots (desktop + mobile)
- visual comparison
- UI interaction results
- SEO/indexing checks
- form testing results
- issue list with severity
- conclusion
---
## Severity rules
### Critical
- noindex / nofollow / X-Robots-Tag blocking indexing
- page inaccessible
- wrong redirect
- core UI not working
- key buttons or popups broken
- key form not submitting
### Major
- broken UI interactions
- major layout mismatch
- mobile UX broken
- validation issues
- backend form errors
### Minor
- small visual differences
- spacing/alignment issues
- minor UI glitches
---
## Example workflow
1. Open page (desktop)
2. Screenshot
3. Open page (mobile)
4. Screenshot
5. Inspect with playwright-screenshot-inspector
6. Compare with reference
7. Test ALL UI interactions (buttons, popups, tabs, etc.)
8. Check response headers and meta tags
9. Detect and test all forms
10. Generate HTML report
11. Return summary
---
## Important constraints
- Do not skip UI interaction testing
- Do not skip mobile
- Do not skip screenshots
- Do not skip form submissions
- Always treat indexing blockers as **Critical**
- Always generate HTML report
---
## Example instruction for the agent
Test the provided page using Playwright.
Use playwright-screenshot-inspector for visual validation.
Follow this order:
1. Compare real page vs reference (desktop + mobile) and record differences.
2. Test ALL UI elements (buttons, popups, accordions, tabs, etc.).
3. Check server response, meta tags, and indexing blockers (Critical if present).
4. Find and submit all forms, validate responses and errors.

68
pages.json Normal file
View File

@@ -0,0 +1,68 @@
{
"version": 1,
"description": "Primary pages and visual references for default website QA runs.",
"sites": [
{
"site_id": "prombez-good-production",
"site_name": "Prombez Good Production",
"base_url": "https://prombez.cp.good-production.xyz",
"pages": [
{
"id": "home",
"title": "Главная",
"url": "https://prombez.cp.good-production.xyz/",
"reference": {
"desktop_screenshot": null,
"mobile_screenshot": null,
"figma_url": null,
"notes": "Fill in approved reference assets for homepage."
}
},
{
"id": "test-search-landing",
"title": "Лендинг test-search",
"url": "https://prombez.cp.good-production.xyz/test-search/",
"reference": {
"desktop_screenshot": null,
"mobile_screenshot": null,
"figma_url": null,
"notes": "Use the approved desktop/mobile references for this landing page when available."
}
},
{
"id": "products",
"title": "Каталог продуктов",
"url": "https://prombez.cp.good-production.xyz/prod/",
"reference": {
"desktop_screenshot": null,
"mobile_screenshot": null,
"figma_url": null,
"notes": "Fill in approved reference assets for the products page."
}
},
{
"id": "history",
"title": "Истории успеха",
"url": "https://prombez.cp.good-production.xyz/history/",
"reference": {
"desktop_screenshot": null,
"mobile_screenshot": null,
"figma_url": null,
"notes": "Fill in approved reference assets for the success stories page."
}
},
{
"id": "contacts",
"title": "Контакты",
"url": "https://prombez.cp.good-production.xyz/company/contacts/",
"reference": {
"desktop_screenshot": null,
"mobile_screenshot": null,
"figma_url": null,
"notes": "Fill in approved reference assets for the contacts page."
}
}
]
}
]
}

View File

@@ -0,0 +1,434 @@
---
name: playwright-screenshot-inspector
description: LLM-powered visual testing expert for automated screenshot capture, analysis, and UI verification using Playwright with multimodal AI inspection.
metadata:
category: Testing
tags:
- playwright
- visual-testing
- screenshots
- ui-verification
- automation
pairs-with:
- skill: playwright-e2e-tester
reason: Visual regression testing extends E2E test suites with screenshot comparison
- skill: webapp-testing
reason: Screenshot inspection automates the visual verification that interactive testing does manually
- skill: color-contrast-auditor
reason: Automated screenshot analysis can detect contrast violations across UI states
---
# Playwright Screenshot Inspector
LLM-powered visual testing expert for automated screenshot capture, analysis, and UI verification using Playwright with multimodal AI inspection.
## Activation Triggers
**Activate on:**
- "screenshot test", "visual test", "screenshot inspection"
- "playwright headless", "playwright screenshot"
- "UI verification", "visual regression"
- "theme compliance test", "dark mode test", "light mode test"
- "automated screenshot", "capture and analyze"
- "compare screenshots", "visual diff"
**NOT for:**
- Simple one-off screenshots (use browser DevTools)
- Pixel-perfect comparison without AI (use native Playwright `toHaveScreenshot`)
- Non-web UI testing (use platform-specific tools)
- Performance testing (use Lighthouse/WebPageTest)
---
## Core Philosophy
Traditional visual testing compares pixels. **LLM-powered visual testing understands semantics.**
Instead of "these 50 pixels changed", LLM inspection answers:
- "Is the content actually rendered?"
- "Does the theme switch correctly?"
- "Are interactive elements visible and properly styled?"
- "What's broken vs. what's just different?"
---
## The Screenshot Inspection Loop
```
┌─────────────────────────────────────────────────────────────┐
│ LLM SCREENSHOT INSPECTION │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. CAPTURE (Playwright) │
│ └─► Wait for React hydration, not just network │
│ │
│ 2. READ (Claude vision) │
│ └─► Pass screenshot to LLM with specific questions │
│ │
│ 3. ANALYZE (Structured response) │
│ └─► Extract: content present? theme correct? errors? │
│ │
│ 4. ACT (Conditional logic) │
│ └─► Pass/fail based on semantic understanding │
│ │
└─────────────────────────────────────────────────────────────┘
```
---
## Critical: Waiting for React Content
**The #1 failure mode**: Taking screenshots before React hydrates.
### Anti-Pattern: Network Idle Alone
```python
# ❌ WRONG - React may not have rendered yet
page.goto(url)
page.wait_for_load_state('networkidle')
page.screenshot(path='broken.png') # Often blank!
```
### Correct Pattern: Wait for Actual Content
```python
# ✅ CORRECT - Wait for React to mount
page.goto(url, wait_until='domcontentloaded')
page.wait_for_load_state('networkidle')
# Give React time to hydrate
import time
time.sleep(0.5)
# Wait for actual content selector
page.wait_for_selector('.main-content, h1, [data-testid="app"]',
state='visible',
timeout=10000)
# Verify content exists
body_text = page.locator('body').inner_text()
if len(body_text) < 50:
time.sleep(2) # Extra wait for slow hydration
page.screenshot(path='good.png', full_page=True)
```
### Content Verification Function
```python
def wait_for_react_content(page, selectors, timeout=10000):
"""Wait for React to hydrate by checking for actual content."""
page.wait_for_load_state('domcontentloaded')
page.wait_for_load_state('networkidle')
time.sleep(0.5) # React hydration buffer
for selector in selectors.split(','):
try:
locator = page.locator(selector.strip())
if locator.count() > 0:
locator.first.wait_for(state='visible', timeout=timeout)
return True
except:
continue
# Fallback: wait for substantial body content
try:
page.wait_for_function(
'document.body.innerText.length > 100',
timeout=timeout
)
return True
except:
return False
```
---
## Headless Mode: Preventing Window Spam
**Always use `headless=True`** to prevent browser windows from spawning:
```python
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
# CRITICAL: headless=True prevents visible browser windows
browser = p.chromium.launch(headless=True)
context = browser.new_context(
viewport={'width': 1280, 'height': 800},
color_scheme='dark' # Initial theme
)
page = context.new_page()
# ... your test logic ...
browser.close() # Always clean up
```
### Theme Testing Pattern
```python
# Dark mode screenshot
page.emulate_media(color_scheme='dark') # Note: on PAGE, not context
page.goto(url)
wait_for_react_content(page, '.app-container, main, h1')
page.screenshot(path='dark.png', full_page=True)
# Light mode screenshot
page.emulate_media(color_scheme='light')
page.reload()
wait_for_react_content(page, '.app-container, main, h1')
page.screenshot(path='light.png', full_page=True)
```
---
## LLM Screenshot Analysis Patterns
### Pattern 1: Content Verification
```
Prompt: "Analyze this screenshot. Answer:
1. Is the main content rendered (not blank/loading)?
2. What major UI elements are visible?
3. Are there any error states or broken layouts?
4. Rate content completeness: FULL / PARTIAL / EMPTY"
```
### Pattern 2: Theme Compliance
```
Prompt: "This is a {dark/light} mode screenshot. Verify:
1. Background color matches expected theme (dark bg for dark mode)
2. Text has sufficient contrast against background
3. Interactive elements are visible and styled correctly
4. No theme leakage (dark elements on light bg or vice versa)"
```
### Pattern 3: Comparison Analysis
```
Prompt: "Compare these two screenshots (before/after). Identify:
1. What changed between them?
2. Are changes intentional (theme switch) or bugs?
3. Is any content missing in the 'after' version?
4. Rate similarity: IDENTICAL / MINOR_DIFF / MAJOR_DIFF / BROKEN"
```
### Pattern 4: Accessibility Check
```
Prompt: "Evaluate this screenshot for visual accessibility:
1. Is text readable (sufficient size and contrast)?
2. Are interactive elements clearly identifiable?
3. Is there visual hierarchy (headings, sections)?
4. Any elements that would fail WCAG contrast requirements?"
```
---
## Complete Test Script Template
```python
#!/usr/bin/env python3
"""
LLM-Powered Screenshot Test Suite
Captures screenshots and uses Claude vision for semantic analysis.
"""
from playwright.sync_api import sync_playwright
import os
import time
PAGES_TO_TEST = [
# (path, name, content_selectors)
('/', 'Home', '.hero, main, h1'),
('/about', 'About', '.about-content, main, h1'),
('/dashboard', 'Dashboard', '.dashboard, .stats, h1'),
]
BASE_URL = 'http://localhost:5173'
SCREENSHOT_DIR = '/tmp/visual-tests'
def wait_for_content(page, selectors, timeout=10000):
"""Wait for React/Vue/Svelte to hydrate."""
page.wait_for_load_state('domcontentloaded')
page.wait_for_load_state('networkidle')
time.sleep(0.5)
for selector in selectors.split(','):
try:
loc = page.locator(selector.strip())
if loc.count() > 0:
loc.first.wait_for(state='visible', timeout=timeout)
return True
except:
continue
try:
page.wait_for_function('document.body.innerText.length > 100', timeout=timeout)
return True
except:
return False
def capture_themed_screenshots(page, url, name, selectors):
"""Capture both dark and light mode screenshots."""
safe_name = name.lower().replace(' ', '-')
results = {'name': name, 'url': url}
for theme in ['dark', 'light']:
page.emulate_media(color_scheme=theme)
if theme == 'dark':
page.goto(url, wait_until='domcontentloaded')
else:
page.reload(wait_until='domcontentloaded')
content_loaded = wait_for_content(page, selectors)
if not content_loaded:
print(f" ⚠️ {theme} mode: Content slow to load, waiting...")
time.sleep(2)
screenshot_path = f'{SCREENSHOT_DIR}/{safe_name}-{theme}.png'
page.screenshot(path=screenshot_path, full_page=True)
# Check content length
body_text = page.locator('body').inner_text().strip()
results[f'{theme}_screenshot'] = screenshot_path
results[f'{theme}_content_length'] = len(body_text)
results[f'{theme}_has_content'] = len(body_text) > 50
print(f" {theme}: {'✅' if results[f'{theme}_has_content'] else '❌'} ({len(body_text)} chars)")
return results
def run_tests():
"""Run visual tests on all pages."""
os.makedirs(SCREENSHOT_DIR, exist_ok=True)
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context(
viewport={'width': 1280, 'height': 800},
color_scheme='dark'
)
page = context.new_page()
# Capture console errors
errors = []
page.on('console', lambda m: errors.append(m.text) if m.type == 'error' else None)
results = []
for path, name, selectors in PAGES_TO_TEST:
print(f"Testing {name}...")
url = f'{BASE_URL}{path}'
result = capture_themed_screenshots(page, url, name, selectors)
result['errors'] = list(errors)
errors.clear()
results.append(result)
browser.close()
# Summary
print("\n" + "=" * 50)
print("VISUAL TEST SUMMARY")
print("=" * 50)
passed = sum(1 for r in results
if r.get('dark_has_content') and r.get('light_has_content'))
print(f"\nPassed: {passed}/{len(results)}")
print(f"Screenshots: {SCREENSHOT_DIR}")
return results
if __name__ == '__main__':
run_tests()
```
---
## MCP vs Native Playwright Decision Tree
```
What are you doing?
├─ Interactive debugging / exploring
│ └─► Playwright MCP (see live browser)
├─ Automated test suite
│ └─► Native Python Playwright (headless)
├─ CI/CD pipeline
│ └─► Native Python Playwright (headless)
├─ Screenshot capture for LLM analysis
│ └─► Native Python Playwright (headless)
└─ One-off inspection
└─► Either works, MCP is convenient
```
---
## Common Failures and Fixes
### Failure: Blank Screenshots
**Cause**: Screenshot taken before React hydrates
**Fix**: Wait for content selectors, add hydration buffer
### Failure: "Reconnecting..." Badge Visible
**Cause**: HMR/WebSocket not connected (cosmetic in tests)
**Fix**: This is often fine - focus on actual content
### Failure: Theme Not Applied
**Cause**: `emulate_media` called on context instead of page
**Fix**: Use `page.emulate_media(color_scheme='dark')`
### Failure: Browser Windows Spawning
**Cause**: `headless=False` or using MCP instead of native
**Fix**: Use `p.chromium.launch(headless=True)`
### Failure: Timeout on Content
**Cause**: Wrong selectors or page actually broken
**Fix**: Verify selectors exist, check console errors
---
## Integration with Claude Code
When Claude reads screenshots captured by this pattern:
1. **Request specific analysis**: Don't just show screenshot - ask targeted questions
2. **Provide context**: "This should be dark mode" or "This is the login page"
3. **Compare systematically**: Before/after, dark/light, desktop/mobile
4. **Trust semantic analysis**: LLM can tell "blank page" from "content loaded"
---
## References
### Research Papers
- [Using Vision LLMs For UI Testing](https://courses.cs.washington.edu/courses/cse503/25wi/final-reports/Using%20Vision%20LLMs%20For%20UI%20Testing.pdf) - University of Washington
- [Vision-driven Automated Mobile GUI Testing](https://arxiv.org/html/2407.03037v1) - Multimodal LLM approach
- [ScreenLLM: Stateful Screen Schema](https://arxiv.org/html/2503.20978v1) - UI understanding framework
### Tools & Integrations
- [Building an AI QA Engineer with Claude + Playwright](https://alexop.dev/posts/building_ai_qa_engineer_claude_code_playwright/)
- [AI-Powered Visual Testing in Playwright](https://testrig.medium.com/ai-powered-visual-testing-in-playwright-from-pixels-to-perception-dd3ee49911d5)
- [Playwright Visual Regression Testing Guide](https://testgrid.io/blog/playwright-visual-regression-testing/)
### Official Documentation
- [Playwright Visual Comparisons](https://playwright.dev/docs/test-snapshots)
---
## Version History
- **2026-01-23**: Initial skill creation
- Researched multimodal LLM screenshot analysis best practices
- Documented React hydration waiting patterns
- Added headless mode requirements
- Created complete test script template
---
**Core Insight**: The difference between useless and useful screenshot tests is waiting for content, not just network. LLMs can analyze semantics, but only if there's actually content to analyze.