Object and Bounding Box Fine-tuning

Nov 12, 2024

With the versatile and compact Florence 2

3 Comments

I am amazed this works with such a small set. One way of reducing the tedium of manual annotation is to use a better object detection algorithm to do it for you. Grounding DINO is one of these. You can get this on hugging face spaces eg. https://huggingface.co/spaces/merve/Grounding_DINO_demo

It is slow to use on hugging face but can be downloaded and used on your PC with GPU or on runpod/vast.ai/COLAB.

I just tried it with "chess piece" and it did a pretty good job. It is not smart enough to know what a black king is though. So you would need to manually change annotation class for Ronan's sample chess piece demo. This is an example of a zero shot detector so it can work with objects outside of those in standard classes like COCO. With IDE's like cursor or (my fave) cline you can also generate python code to quickly cycle through a dataset and change/add/delete bounding boxes and class names without too much fuss. Grounding DINO is by no means perfect so it will make errors but can dramatically reduce the amount of manual annotation needed.

Expand full comment

Reply (1)

Trelis Research

Nov 13

nice, just tried grounding dino, indeed that's a good option for getting the bounding box if there's just one piece. Struggles more with multiple pieces. I think you're right though that generating python code (or a UI, as I did) is a solid option and if you have a good UI it's pretty fast to annotate (probably only a tad slower than having to label bounding boxes). SEe here: https://github.com/TrelisResearch/install-guides/blob/main/Grounding%20DINO%20on%20three%20chess%20pieces.png

Just took a look at cline, pretty insane that's open source. The funny thing is, Cursor still makes sense because I pay $20 per month and get 500 premium requests. If my average prompt length is very long (15k+ tokens), then that makes economic sense over me paying as I go (which is what I end up doing for the second half of each month when I run out of requests).

Something interesting is perhaps to use qwen 2.5 coder with fireworks, for <$1/MM tokens... so a third the price of sonnet.

Expand full comment

Reply (1)

Andrew Walsh

Nov 13Edited

I am not specifically talking about chess pieces. I doubt anyone looking at this would be. If people are interested in more popular classes like "person" there are better options than grounding DINO, such as DETA. DETA could also benefit from fine tuning for a specific class and has no language component at all. Language is not needed if you are looking for specific classes and fine tune (eg. person, parcel etc). I'd love to see you do a DETA fine tune!

I love cursor and use it all the time. The price is excellent and I have really hammered it and don't seem to run out of requests. I must have but the extra delay does not seem significant. Sometimes even claude sonnet 3.5 just gives up and repeats itself. This is where cline can help. Even though the best model is still claude, being agentic it seems to be able to figure stuff out better than normal Claude? Not sure why but it does. Also with openrouter under cline you can choose from a huge variety of models, including free ones, and swap between them in the same conversation (yes cursor does this too). cline can get very expensive though fairly quickly if you use sonnet and even more so with gpt versions. qwen coder does not seem to work with cli ne/openrouter though ATM and just gave me garbage results when I tried it.

So my practice is cursor first then cline. I will probably cancel gpt/claude separately since they are covered by cursor/cline. cline is stupid name BTW sounds like Jetbrains Clion. Before that they called it Claude Dev. Even stupider.

PS. DETA may need far more training samples for decent results though. Check https://huggingface.co/spaces/hysts/DETA

Expand full comment