The Problem
Most AI tools are black boxes. You give them input, you get output, and you have no idea what happened in between. That works fine for simple tasks. But for complex analysis, the thinking matters as much as the result.
I wanted to build something that could tear down a product the way a PM actually would. Not just "here's what I see" but the full process: research the company, analyze the design, find the positioning, critique the strategy, challenge those critiques, resolve the debates, and generate actual recommendations.
And I wanted it to be transparent. You should be able to watch it think.
The Architecture
Twister uses a multi-agent pipeline. Each agent has a specific job, and they pass information to each other in sequence. The key insight is that good analysis requires different modes of thinking, and it's hard to get one model to do all of them well in a single prompt.
Here's the pipeline:
The UI shows each agent as it runs. You can see which one is active and when it hands off to the next. That's the "open architecture" part. It's not just about the output. It's about proving the system actually works.
The Agent Debate Pattern
This is the part I'm most proud of.
After the Critique Agent synthesizes all the findings, the Devil's Advocate Agent comes in and challenges everything. It finds holes in the analysis, offers counterpoints, and identifies risks in the recommendations.
Then the Synthesis Agent resolves the debate. It decides which findings survived scrutiny, which ones need more research, and assigns confidence scores to each conclusion.
This matters because AI models are confident even when they're wrong. Building in structured disagreement forces the system to actually think about what it knows versus what it's guessing.
I ran Twister against my own product, SnapShelf. The Critique Agent identified "pricing transparency in CTAs" as a strength. Then the Devil's Advocate caught a contradiction: if pricing is transparent, why is the "Download" button ambiguous for a paid app? The analysis states one thing is a strength while flagging a related element as a weakness. That's exactly the kind of inconsistency a human reviewer would catch.
The Synthesis Agent resolved it by noting that CTA clarity "is highly dependent on the product's pricing model" and flagged it as an uncertainty requiring more context. The final confidence score for UX analysis was 80%, but competitive positioning only got 60% because there wasn't enough public data to validate the differentiation claims.
That's intellectually honest output. The system knows what it knows and what it doesn't.
Sample Output
Here's what Twister produced when I pointed it at snapshelfapp.com:
Vision Analysis: The Vision Agent broke down layout, hierarchy, color scheme, typography, and CTA effectiveness. It identified that the dark aesthetic conveyed a "pro tool feel" and that the visual hierarchy was strong. It also called out that product screenshots had small details that could be hard to read, and suggested adding a demo video. Here's the thing: the landing page already has mp4 videos, but the Vision Agent only had a static screenshot to work with, so it couldn't see them. That's a known limitation of the current approach and something I'd address by adding browser automation to capture dynamic content.
Positioning Analysis: Identified the core value prop ("stop losing files"), target audience (macOS users who struggle with digital clutter), and differentiation (filling an "unmet need" in the macOS ecosystem). Noted that the headline "The Missing Shelf" is evocative but could be more immediately descriptive of the product's core function.
Devil's Advocate Highlights:
- Caught the contradiction between "pricing transparency" and "ambiguous Download CTA"
- Challenged whether the "Download" button is actually confusing, or if it's a standard free trial entry point
- Suggested the minimalist design might be intentional to reinforce the "Clutter kills focus" message
- Questioned whether adding extensive social proof would actually help, or just add visual noise that contradicts the product's anti-clutter positioning
Strategy Recommendations: Prioritized into immediate, short-term, and long-term. Immediate actions included clarifying the Download CTA and adding essential legal links. Short-term included adding testimonials. Long-term included building out an "About Us" page and exploring a free trial option. (The system also suggested producing a demo video, not realizing the page already has them.)
Generative Output: Five alternative CTAs ("Get Shelf Now, $4.99 Limited Time", "Stop Losing Files, Buy Shelf for $4.99") and five alternative headlines ("Shelf: Your Smart macOS File Organizer", "Declutter Your Mac. Find Files Instantly."). It also generated a wireframe suggesting a revised hero section with a "Watch Demo" button and star ratings for social proof.
Tech Stack
I built Twister with Next.js 14 using the App Router. The agent orchestration runs server-side and streams updates to the client so you can watch progress in real time. Each agent is a separate async function that calls Google's Gemini API.
The vision analysis uses Gemini's multimodal capabilities. I capture a screenshot with ScreenshotOne, convert it to base64, and send it directly to the model. The research agent uses Serper for web search, then Gemini synthesizes the results into structured data.
The UI is React with Tailwind and Framer Motion. The color scheme is inspired by storm radar displays. The name is Twister because a tornado tears things down, and the pipeline is shaped like a funnel: wide intake at the top, narrow output at the bottom.
What I Learned
Building this taught me how to think about AI as a system, not just a feature. The hard part isn't calling an API. It's designing the flow: what information does each agent need, what should it output, how do you handle disagreement between agents, and how do you make the whole thing observable.
I also learned that structured output is everything. Getting a model to return valid JSON every time requires careful prompting and fallback parsing. Most of the edge case bugs were JSON parsing issues.
The agent debate pattern is something I want to use more. It's a simple idea but it genuinely improves output quality. Most AI products would benefit from building in some form of self-critique.
What I'd Add Next
If I kept building, here's where I'd go:
- Multi-page crawling: Auto-detect Pricing, Features, and About pages. Analyze the whole product, not just the homepage.
- Competitor mode: Enter one URL, Twister finds competitors, screenshots them, and generates a side-by-side comparison.
- Twister Score: A single 0-100 rating that synthesizes everything into one comparable number.
- Browser automation: Use Playwright to actually click through the product, record flows, and analyze dynamic states.
The foundation is there. The architecture scales.
Links
GitHub: github.com/brehndanknox/twister