browser use agents

Lets say we want to make a subagent that tests frontends. This would be a browser use agent. So what should the main orchestrator do? The topology as follows

Implement changes
Decide how to test it → generate detailed plans on how to test it
Launch browser use agents that actually test it
browser use agent

When testing vibe coded apps, we have:

Functional: does the app behave as expected when interacted with?
1. Does this copy button successfully save to my clipboard
2. Does clicking this tab in the navbar actually go to the correct tab
Visual:
1. Is there text overflow?
2. Are things not centered?
3. Are gradients as expected?

How should we prompt the agent to actually test this? How will the agent know that it has completed testing?

The agent outcome for visual will always be “success” because the agent will have completed its task, and then checked the visual stuff. The agent outcome for functional could be false, if it couldn’t actually do the function.

benchmarks:

mind2web
- multimodal one exists that includes screenshots
mind2web2
- For “agentic search”
Real evals
- not public

Midtraining (VQA tasks on web ui):

Krish's Digital Jungle 🍃

Projects
Thoughts

browser use agents

Graph View

Recent Notes

MagicMidi

capstone

laundry folding robot

Krish's Digital Jungle 🍃

Projects Thoughts

browser use agents

Graph View

Recent Notes

MagicMidi

capstone

laundry folding robot

Projects
Thoughts