QuadratureSurfer

joined 1 year ago
[–] [email protected] 1 points 3 months ago

Sure, but this is just a more visual example of how compression using an ML model can work.

The time you spend reworking the prompt, or tweaking the steps/cfg/etc. is outside of the scope of this example.

And if we're really talking about creating a good pic it helps to use tools like control net/inpainting/etc... which could still be communicated to the receiving machine, but then you're starting to lose out on some of the compression by a factor of about 1KB for every additional additional time you need to run the model to get the correct picture.

[–] [email protected] 0 points 3 months ago* (last edited 3 months ago) (2 children)

You also have to keep in mind that, the more you compress something, the more processing power you're going to need.

Whatever compression algorithm that is proposed will also need to be able to handle the data in real-time and at low-power.

But you are correct that compression beyond 200x is absolutely achievable.

A more visual example of compression could be something like one of the Stable Diffusion AI/ML models. The model may only be a few Gigabytes, but you could generate an insane amount of images that go well beyond that initial model size. And as long as someone else is using the same model/input/seed they can also generate the exact same image as someone else. So instead of having to transmit the entire 4k image itself, you just have to tell them the prompt, along with a few variables (the seed, the CFG Scale, the # of steps, etc) and they can generate the entire 4k image on their own machine that looks exactly the same as the one you generated on your machine.

So basically, for only ~~a few bits~~ about a kilobyte, you can get 20+MB worth of data transmitted in this way. The drawback is that you need a powerful computer and a lot of energy to regenerate those images, which brings us back to the problem of making this data conveyed in real-time while using low-power.

Edit:

Tap for some quick napkin mathFor transmitting the information to generate that image, you would need about 1KB to allow for 1k characters in the prompt (if you really even need that),
then about 2 bytes for the height,
2 for the width,
8 bytes for the seed,
less than a byte for the CFG and the Steps (but we'll just round up to 2 bytes).
Then, you would want something better than just a parity bit for ensuring the message is transmitted correctly, so let's throw on a 32 or 64 byte hash at the end...
That still only puts us a little over 1KB (1078Bytes)... So for generating a 4k image (.PNG file) we get ~24MB worth of lossless decompression.
That's 24,000,000 Bytes which gives us roughly a compression of about 20,000x
But of course, that's still going to take time to decompress as well as a decent spike in power consumption for about 30-60+ seconds (depending on hardware) which is far from anything "real-time".
Of course you could also be generating 8k images instead of 4k images... I'm not really stressing this idea to it's full potential by any means.

So in the end you get compression at a factor of more than 20,000x for using a method like this, but it won't be for low power or anywhere near "real-time".

[–] [email protected] 1 points 3 months ago (1 children)

Shout-out to Archive.org for all the awesome work they do to backup what they can from the internet.

(Especially when some stack overflow answer to a question is just a link to some website that has either changed or no longer exists).

[–] [email protected] 3 points 3 months ago

Sure, but the problem is that our language has evolved and "AI" no longer means what it used to.

Over a decade ago it was mostly reserved for what you're describing (which I would call "AGI" now). However, even then we did technically use "AI" for things like NPCs in video games. That kind of AI just boils down to a bunch of If-Then statements.

[–] [email protected] 3 points 3 months ago

I think any time "AI" is involved, journalists should be much more specific about what exactly they're talking about. LLMs, Computer Vision, Generative models (text/image/audio), Upscaling (can start to get a little muddy here between upscaling and generative models depending on how this is implemented), TTS, STT, etc..

I definitely agree that "AI" has been abused into the definition it is now. Over a decade ago "AI" was mostly reserved for what we have to call "AGI" now.

[–] [email protected] -1 points 3 months ago* (last edited 3 months ago)

Sure, but don't let that feed into the sentiment that AI = scams. It's way too broad of a term that covers a ton of different applications (that already work) to be used in that way.

And there are plenty of popular commercial AI products out there that work as well, so trying to say that "pretty much everything that's commercial AI is a scam" is also inaccurate.

We have:
Suno's music generation
NVidia's upscaling
Midjourney's Image Generation
OpenAI's ChatGPT
Etc.

So instead of trying to tear down everything and anything "AI", we should probably just point out that startups using a lot of buzzwords (like "AI") should be treated with a healthy dose of skepticism, until they can prove their product in a live environment.

[–] [email protected] -2 points 3 months ago (2 children)

If you think that "pretty much everything AI is a scam", then you're either setting your expectations way too high, or you're only looking at startups trying to get the attention of investors.

There are plenty of AI models out there today that are open source and can be used for a number of purposes: Generating images (stable diffusion), transcribing audio (whisper), audio generation, object detection, upscaling, downscaling, etc.

Part of the problem might be with how you define AI... It's way more broad of a term than what I think you're trying to convey.

[–] [email protected] 1 points 3 months ago

Interesting. Didn't know this was a feature. How does the filter work? Completely hides comments/posts containing the keyword?

Or does it do something like Steam does by turning every letter into a ♥?

I tried to add the word "swipe" to the filter list, but I can still see the other 2 comments on this post along with the word "swipe".

[–] [email protected] 14 points 4 months ago (2 children)

I'm just glad to hear that they're working on a way for us to run these models locally rather than forcing a connection to their servers...

Even if I would rather run my own models, at the very least this incentivizes Intel and AMD to start implementing NPUs (or maybe we'll actually see plans for consumer grade GPUs with more than 24GB of VRAM?).

[–] [email protected] 4 points 4 months ago (1 children)

Can you provide some context for this? Which petition is this about?

[–] [email protected] 5 points 4 months ago (1 children)

@sugar_[email protected] proposed this theory the other day, and I think it makes a lot of sense. A lot of journalists are feeling threatened by the onslaught of LLMs so I would expect to see a lot more news attempting to shine a negative light on LLMs in any way possible.

https://sh.itjust.works/comment/11586805

[–] [email protected] 3 points 4 months ago (1 children)

There's a place for AI in NPCs but developers will have to know how to implement it correctly or it will be a disaster.

LLMs can be trained on specific characters and backstories, or even "types" of characters. If they are trained correctly they will stay in character as well as be reactive in more ways than any scripted character could ever do. But if the Devs are lazy and just hook it up to ChatGPT with a simple prompt telling it to "pretend" to be some character, then it's going to be terrible like you say.

Now, this won't work very well for games where you're trying to tell a story like Baldur's Gate... instead this is better for more open world games where the player is interacting with random characters that don't need to follow specific scripts.

Even then it won't be everything. Just because an LLM can say something "in-character" doesn't mean it will line up with its in-game actions. So additional work will need to be made to help tie actions to the proper kind of responses.

If a studio is able to do it right, this has game changing potential... but I'm sure we'll see a lot of rushed work done before anyone pulls it off well.

view more: ‹ prev next ›