Will AI agents be able to regularly code small features for us in a year?

Premium

333

Ṁ410k

Jul 2

98%

chance

ALL

I'm thinking of something like https://mentat.ai/, but that actually works.

I will provide a paragraph or so describing the change I want made. Then it should create a GitHub PR, which I will review and leave only a few comments before merging. The whole process should take less than 30 minutes. This should work fairly reliably.

I tried this yesterday and it failed haha:
https://github.com/manifoldmarkets/manifold/pull/2694

See more discussion in my post:

https://jamesgrugett.com/p/software-automation-will-make-us

#AI

#Manifold Business Future

Get Ṁ1,000 play money

27 Comments

Sort by:

Here's another good one-shot pr from cursor's background agent, adding the ability for admins/mods to 'delete' spam comments so that they aren't rendered at all, unlike the 'hide' feature which still renders the hidden comments: https://github.com/manifoldmarkets/manifold/pull/3600

This took a minute to prompt, 5m for cursor to come up with a solution, and 5-10m to test to make sure it worked.

This was a really good experience! I used cursor's background agent to add a minimum bet filter to the trades tab and it finished a good start in 5 minutes, and then I tested it and prompted it to get rid of pagination, and use infinite scroll instead. Done in less than 20 minutes! https://github.com/manifoldmarkets/manifold/pull/3599

bought Ṁ5,000 YES

This looks good to me, stephen gave it two prompts to create this and I think it took less than 10 mins https://github.com/manifoldmarkets/manifold/pull/3588

@ian Looks like we need another prompt to fix the type error, should come in well under 30 mins still, though

bought Ṁ50 NO

@ian Initial comment was more than 30 minutes ago, so this is a failure

bought Ṁ1,000 NO

bought Ṁ2,500 YES at 93%

@CalibratedNeutral oh we stopped paying attention

@CalibratedNeutral I don't know if stephen told it to fix the type error

@ian the key to vibe-coding is to stay just the right amount drunk and not to over do it

bought Ṁ50 YES

Claude 4 with github I think does what the mentat.ai thing you linked does

bought Ṁ250 NO

@ian do you have access to chatgpt plus or pro and would be willing to see how codex-1 fares? it's currently only accessible on pro and teams iirc but will be accessible to plus probably before the market closes

bought Ṁ5,000 YES

GPT 4.1 is awesome for coding.

It's genuinely really good. (mini is ok, nano is dogwater). I have been using it off azure with cursor both as assist and tedious implementation speedrunner - it's one-shot so many instructions that 4o would have a bad time with, and that claude would overthink.

Not tab complete, mostly just asking stuff. Really has come a long way with code

Crazy how ai agents are regularly building small features for me almost daily and this market is still at 80%

@DarklyMade is this code peer reviewed?

@Kire_ of course! The peer review AI looks at it!

I'd like to conduct some tests using codebuff/cursor. What are acceptable small features in your mind? I have a couple ideas:
- add a button to the comments bottom row that allows users to tip the commenter. Denormalize the tip amount onto the comment and display the total tipped amount on the button.
- Add a delete button for admins/mods that marks a comment as deleted (don't actually delete the comment, just set the deleted flag and hidden flags both) that hides the comment completely from the market.

@JamesGrugett said the delete comment button for spam fit the bill, I'll try using codebuff to do this soon

@ian a "view results" button on polls?

@cthor Also seems reasonable!

@ian I am aware that you work on Manifold, but since you are also the largest YES holder can we maybe agree to let @JamesGrugett do these kinds of evaluations once time comes.

@CalibratedNeutral That sounds reasonable, although he doesn't work at manifold anymore so I'm not sure if he'll want to put 30 mins in to do this. I was going to film my attempt from scratch

@CalibratedNeutral I was not aware of that. Then maybe a third party (another developer working on Manifold)? The stakes are reasonably high for me, so I really would strongly prefer to have everything as unbiased as possible.

@CalibratedNeutral We might be able to get @SG or @SirSalty to do it

@CalibratedNeutral Alternatively, @JamesGrugett could test this question on his new startup, codebuff. He uses codebuff to help develop codebuff

@ian Either option sounds good to me as long as the resolution criteria are followed according to @JamesGrugett's judgement

@ian how tf did you get the dead head badge?

Comment hidden

Related questions

Related questions