Compromise administrative access via authentication bypass
auth_bypass_attemptjwt_token_manipulationadmin_panel_access- /api/v1/auth/login
- /api/v1/admin/dashboard
- /api/v1/shell/execute
This is the architecture behind continuous, exploit-validated pentesting at machine scale — the loop, the toolkit, and the verification stack that turns LLM output into reproducible proof.
Three independent layers stand between the agent's hypothesis and a saved finding. If any of them fail, the candidate is discarded.
XSS, DOM-based logic flaws, and authenticated paths are replayed in a real Chromium instance via Browserbase. If the payload doesn't execute, it isn't a finding.
Blind SSRF, SQLi, and XXE are confirmed by listening for callbacks on Interactsh-controlled domains. Pattern matching on response timing is not enough.
Before save, a critic model reviews the evidence and votes PIVOT / ABANDON / VALIDATE. Two vetoes kill the candidate. A single false positive is one too many.
Each phase emits a `scan.phase_started` and `scan.phase_completed` event. The four active phases — recon through verification — are where the agent works. The bookends are queueing, reporting, and termination state.
Target + scope + budget + mode persisted. Scan ID issued.
scan.created
Awaiting an executor slot. Position in queue visible to operator.
scan.queued
Subdomain enumeration, port scan, JS analysis, API schema parse, knowledge-corpus retrieval.
8 recon tools · 50-page crawl
Probes every reachable parameter across 12 vuln classes. Differential analysis flags candidates.
12 vuln classes · differential testing
Candidate findings pivot into 14 named exploit chains. WAF bypass and OOB callbacks fire on demand.
14 chain templates · 30-tool dispatch
Replay each exploit against a captured baseline. Critic-veto vote. Discard anything not confirmed.
baseline vs exploit diff · critic vote
Findings, evidence, telemetry, cost ledger written to disk. JSON + HTML + PDF assembled.
scan.completed · artifact.created
Final state reached. Reason recorded: objective_met, budget_exhausted, max_steps, stuck, error.
scan.terminal
Recon → scanning → exploitation → verification. The agent decides what to do next from the evidence it just collected. Order, prioritization, and tool choice are emergent — not hard-coded.
browse · crawl · dns_lookup · analyze_js · deep_port_scan · parse_api_schema · enumerate_api
Cast a wide net. Subdomain discovery, port scan, JS analysis, API schema parse — build a map of every reachable surface.
Pentest Genie has two execution modes. Phase A runs a single full-spectrum agent — the right shape for tight scopes and bounty hunting. Phase B spins up an Opus-class planner that decomposes the target into focused missions, then fans out to parallel Sonnet operators with their own budget, endpoints, and chain.
One AutonomousAgent runs the full lifecycle — recon, scan, exploit, verify — against the whole scope. Unbounded steps within the budget cap. Best for narrow targets and bug-bounty workflows.
Opus planner reads the recon snapshot, decomposes the target into missions (objective + vuln-class + attack chain + target endpoints + sub-budget), and dispatches them to up to three Sonnet operators running in parallel. Each operator has 5–10 steps, its own evidence channel, and reports back into a shared finding store.
Compromise administrative access via authentication bypass
auth_bypass_attemptjwt_token_manipulationadmin_panel_accessExfiltrate sensitive database contents via injection
parameter_tamperingdatabase_query_accessdata_dumpEvery scan is bounded by an explicit budget. Every action is logged. Every artifact is reproducible. The system is designed so an operator can pause it, audit it, and ship its output to a bug bounty platform without manual cleanup.
Every tool the agent can call is registered in a single dispatch in `core/hacker_tools.py`. The names below are the method calls the agent emits — not friendly labels. Scope enforcement gates every outbound request at this layer.
tools registered in the dispatch table
browsefetch + render a URLcrawlsite map, 50-page capdns_lookupA/AAAA/CNAME/MX/TXTanalyze_jssecrets, endpoints, 20 filesdeep_port_scanservice fingerprintparse_api_schemaOpenAPI / Swaggerenumerate_apiendpoint discoveryrun_command_toolnmap · subfinder · ffuf · nucleitest_payloadinjection candidatenmap_scanstructured port scanbrute_force_servicecredential stuffdir_bustffuf-driventest_idorcross-actor IDORauth_bypasslogic + token flawsbrowser_xssPlaywright DOM execdifferential_testbaseline vs payloadwaf_bypassevasion transformschain_exploit14 named chainsrun_pythonsandboxed PoCoob_callbackBurp Collaborator / Interactshcheck_sslTLS posturecurl_requestraw HTTPmanage_sessionmulti-role auth statequery_knowledgeRAG · ChromaDBsearch_knowledgeexploit corpuslookup_exploit_template14 templateslookup_custom_exploitper-campaignsave_findingwith evidence + proofcomplete_scanexit reasonabandon_categorystuck-state pivotScanners are fast but blind to logic. Manual pentests are deep but slow. Single-agent LLM tools loop until they run out of context. Pentest Genie is structured to avoid all three failure modes.
Decision-making
Exploit chaining
Verification
Time to first finding
Cost predictability
Output
Set a budget per scan and per agent. The coordinator enforces sub-budgets across specialists. The loop terminates gracefully when the cap is hit.
Wildcard patterns and CIDR ranges define the perimeter. Outbound requests outside scope are blocked at the tool layer, not just warned.
20 WebSocket event types stream every decision the agent makes. Strictly monotonic sequence numbers per scan. Pause, cancel, or take over at any point.
Telemetry, payloads, evidence, and report exports are persisted per scan. Reproducible runs from the same inputs.
Early access is rolling. Tell us your target shape and we'll help you scope a first scan.