The Problem
A friend of mine blows glass. He's good at it. But he was frustrated with one part of the process: getting quality photos of his finished work.
He sent me some examples. The pieces were beautiful, but the photos didn't do them justice. Bad lighting, cluttered backgrounds, inconsistent quality. For someone trying to sell their art, this matters.
I immediately thought: this is a perfect use case for Gemini Flash. Fast, cheap, accurate. But there was a problem.
Why "Just Use AI" Wasn't the Answer
My friend is a successful businessman, but he's not technical. I could have told him to upload his photos to Gemini and write prompts himself. He could have figured it out eventually.
But he'd never get consistent output. Every generation would look different. If he needed to photograph the same piece from multiple angles, the backgrounds and lighting would vary wildly between shots. For product photography, that's a dealbreaker.
I wanted to build something that made it simple. Upload a photo, pick some options, get a studio-quality result. Every time.
Designing for Simplicity
The interface needed to be dead simple. Three steps:
1. Upload a photo of the glass piece (even a rough phone photo works).
2. Select a surface from presets: marble pedestal, glass table, wood slab, etc.
3. Select an environment from presets: studio lighting, nature backdrop, branded background, etc.
Hit generate. The app isolates the glass piece from the original photo, places it on the selected surface, and composites it into the environment. The output looks like a professional studio shot.
The Consistency Problem
Getting good output once is easy. Getting consistent output every time is hard.
The key was controlling exactly what the model received. I couldn't just pass natural language prompts directly, because small variations in wording would produce wildly different results.
So I built a structured prompt pipeline. Every generation starts with an isolation block that instructs the model to identify and extract the glass piece from its background. Then a placement block specifies the surface. Then an environment block describes the scene.
All presets are stored as JSON. When a user selects "Marble Pedestal" and "Studio Lighting," the app pulls the exact specifications from structured data and assembles the prompt. Same input, same output. Every time.
Adding Custom Prompts
Presets cover 80% of use cases. But sometimes you want something specific. A volcano erupting in the background. A beach at sunset. A specific branded environment.
So I added custom prompting. The user types natural language ("base of a volcano, lava and eruption in background"), and the app parses it into the same JSON format the presets use. This keeps the structured pipeline intact while giving users creative freedom.
Users can save their custom prompts to reuse later. If you find a setup that works for your brand, you can turn it into a personal preset with one click.
The Results
Output quality is staggeringly good. Cost per generation is almost nothing. And consistency between generations is off the charts.
My friend was blown away. He showed it to other glass artists at his studio. They were equally impressed. The overall sentiment: they'd pay to use this for their own work.
I didn't build this to make money. I built it to help a friend showcase his work the way it deserves to be shown. Maybe someday I'll release it publicly. For now, my friend has an edge, and that's enough for me.
What I Learned
This project taught me how to think about AI as a product component, not just a feature. The model itself is powerful, but raw access to it isn't a product. The product is the experience around it: the constraints that ensure consistency, the interface that makes it accessible, the presets that encode expertise.
Most people can't prompt their way to consistent, high-quality output. The job of the product is to do that for them.