universeodon.com is part of the decentralized social network powered by Mastodon.
Be one with the #fediverse. Join millions of humans building, creating, and collaborating on Mastodon Social Network. Supports 1000 character posts.

Administered by:

Server stats:

3.5K
active users

Learn more

Andre Louis

A demonstration of offline and online LLM image description responses

I have one image, and two very differing descriptions of said image. The first is from , a local model installed on my , and the second is from . The GPT response is so huge that I'm going to have to post it as threaded replies, but it shows the vast difference between models. For this first post, the Llava13B response has been included as .

1/

A demonstration of offline and online LLM image description responses

You've presented an image that appears to be a screenshot of a digital audio workstation (DAW) software interface, which is used for recording, editing, and producing audio files.

Here's a detailed breakdown of the various components visible in the screenshot:

1. **Track Header Area (on the left)**:
- This area contains two audio track controls. Each track has a label, which by default might be the name of the instrument or sound source. In this case, the labels are "Piano" and "Komplete", which could refer to a piano track and possibly a track using Native Instruments' Komplete plugins or instrument sounds.
- Each track has an indicator showing a musical note, suggesting that these are MIDI tracks or perhaps tracks designed for musical data.

A demonstration of offline and online LLM image description responses

- There are several controls associated with each track:
- A record enable button ('R') that is typically used to arm the track for recording.
- Solo ('S') and mute ('M') buttons that allow the user to listen to individual tracks in isolation or to mute them, respectively.
- A volume slider to adjust the track's output level.
- A pan control (the knob next to the volume slider) that adjusts the stereo positioning of the track's sound.
- Both track headers have numbers, '1' and '2', which are likely the identifiers for the individual tracks in the session.

A demonstration of offline and online LLM image description responses

2. **Track Content Area (in the middle)**:
- This is the main workspace where audio or MIDI recordings and edits are displayed. However, in this screenshot, the area is empty, indicating that there are no audio clips, MIDI data, or automation currently placed on the timeline.
- A timeline with measure numbers runs along the top, starting from measure '1' and continuing to at least '15'. This suggests that the project is in a musical time format, where time is divided into measures and beats, common in music production.
- The vertical lines indicate the division of measures and seem to correspond to a standard 4/4 time signature, as there are four smaller divisions between the bold numbered lines, which typically represent each beat in a measure.

3. **Top Toolbar**:

A demonstration of offline and online LLM image description responses

- On the very top left, there is a '+' button, likely for adding new tracks or other elements to the project.
- Next to it are a set of transport controls, common in DAW software, allowing the user to play, stop, record, and navigate the timeline, although the specific buttons are not fully visible.
- The playhead or cursor is shown at the beginning of the timeline, indicating the current position within the project; here, it's at the start of the first measure.

The actual software brand is not clearly identifiable from the image due to the lack of distinctive logos or software-specific interface designs. However, many popular DAWs like Logic Pro, Pro Tools, Ableton Live, and others have similar layouts and could appear like this with custom themes or color schemes.

Without additional context or different views, this is as detailed an analysis as the visible elements of the image allow.

A demonstration of offline and online LLM image description responses

@FreakyFwoof Oh that GPT description is so incredible! I once let Be My AI describe a statgraph of a basketball game and didn't expect much. What I got was a very detailed description, score, rebounds and everything.

@riikkakannisto Exactly. I'm very impressed by that response.

@FreakyFwoof Wow. That ... is ... something. It's amazing how we'll end up determining which tool to use. I love using the ChatGPT NVDA add-on for getting screen shot descriptions. My current comp isn't good enough for local GPT4All/Lama.cpp. Hoping to solve this pretty soon.

@ppatel Yeah. My prompt for GPT is quite extensive so it gives a very good description.

@ppatel @FreakyFwoof It's going to be ironic how blind people are going to have one of the clearest cases for getting high-powered GPUs. :)

(Yes, I'm looking into a new computer myself.)

@modulux @ppatel I'll b doing that for my next build as well. Yep.

@modulux @FreakyFwoof It's fortuotous for me that my current laptop is 7 years old and that I have to buy a new machine this year for more than one reason. I'm highly leaning toward a Macbook pro while keeping this Windows machine for some of my needs.

A demonstration of offline and online LLM image description responses

@FreakyFwoof Wow, that's amazing!

A demonstration of offline and online LLM image description responses

@TalkingDroid Definitely one of the best responses I've seen from this.

re: A demonstration of offline and online LLM image description responses

@FreakyFwoof could you send me the model?
I can't find a correct one that will work.

re: A demonstration of offline and online LLM image description responses

@patricus 'ollama run llava:13b'

A demonstration of offline and online LLM image description responses

@FreakyFwoof Yeah, that's why you pay premium. :) I'm very sure even if OpenAI open sourced GPT-4V, People wouldn't even be able to run it with consumer hardware.

@FreakyFwoof Very detailed and explanatory description there. I like it.
I’ve been noticing that I’ve been arguing with AI quite regularly lately. Yesterday it blatantly told me I was wrong in telling them what the object in the photo was. All I was trying to do was get info about whether the entire object was in the frame, and if specific elements were clearly visible, so that I knew it was fine to share.

@cassana But you must be wrong because you're only human, and AI is AI, right? Right? 😂

@FreakyFwoof Right! It’s that almighty superbrain that has access to the whole universe of information known as the World Wide Web of Lies. But it’ll tell you so politely: I’m sorry, but that object isn’t in the photo”, with an implied very Soto voce “you ignorant monkey”.