We evaluate DeepCode on the PaperBench benchmark (released by OpenAI), a rigorous testbed requiring AI agents to independently reproduce 20 ICML 2024 papers from scratch. The benchmark comprises 8,316 ...
Specification by Example is an agile approach to delivering software where the requirements are defined as executable specifications. Teams identify the scope of the work and illustrate the intended ...
Anthropic releases its Agent Skills framework as an open standard, with Microsoft, OpenAI, Atlassian, and Figma already adopting the technology that teaches AI assistants to do specialized work.
Concept & specs for a visual WFH status indicator for remote workers — to reduce interruptions in shared homes by showing when you're on a call, focused, or available. Open to community-driven ...