Hacker News

3943 readers

3 users here now

This community serves to share top posts on Hacker News with the wider fediverse.

Rules

0. Keep it legal

Keep it civil and SFW
Keep it safe for members of marginalised groups

founded 1 year ago

MODERATORS

[email protected]

Huge proportion of internet is AI-generated slime, researchers find (futurism.com)

submitted 8 months ago by [email protected] to c/[email protected]

8 comments fedilink hide all child comments

There is a discussion on Hacker News, but feel free to comment here as well.

top 8 comments

sorted by: hot top controversial new old

[–] [email protected] 11 points 8 months ago (1 children)

I'm noticing more of this on sites I've never visited. It's all word salad.

If the internet continues like this, it'll be next to useless.

[–] [email protected] 13 points 8 months ago* (last edited 8 months ago) (2 children)

This makes me irrationally angry when I'm looking for technical information. The preview looks reasonable. Click on the link, and it's just word salad of technical terms, structured in an intelligent way, but completely devoid of meaning.

Search engines are screwed, and possibly future AI training as well.

[–] [email protected] 7 points 8 months ago

Yeah, websites designed as Q&As are the worst, too.

The first few questions and answers make some sense, and then it just devolves into off topic nonsense that has some keywords you were originally using in your search.

The problem is, if you don't know enough about a topic, you can't even assess whether it's real or crap.

[–] [email protected] 2 points 8 months ago* (last edited 8 months ago)

Ironically, the best way I found to combat this is to use search engines that summarize result pages with AI (e.g., Bing Copilot or Perplexity).

It still sucks even with those options, but it at-least reduces the need to go through several pages of results before finding the first relevant one. Still, the LLMs of those engines hallucinate regularly and give very naive answers, so they're mostly useful for finding relevant sources IMO.

Disclaimer: I pay for Perplexity. I use Perplexity every day but I haven't tried Bing Copilot that much. I haven't used ChatGPT much, I find it way too unreliable, I can't trust its answers. I'm not an investor nor employee of either.

[–] [email protected] 5 points 8 months ago* (last edited 8 months ago) (1 children)

My fear is that Google is going to succeed in using this as an excuse to unilaterally destroy the free web like they've already been trying with attestation.

But the modern web really does suck. I'm not sure how to fix it without corporate influence.

[–] [email protected] 7 points 8 months ago (1 children)

Curation is my answer. Return to the old ways of curating your own lists of resources and sharing them with other people. Web rings, blog rolls, link sharing, RSS

[–] [email protected] 1 points 8 months ago

This

[–] [email protected] 2 points 8 months ago

This is the best summary I could come up with:

Amazon has also had a notably rough go with AI content; in addition to its serious AI-generated book listings problem, a recent Futurism report revealed that the e-commerce giant is flooded with products featuring titles such as "I cannot fulfill this request it goes against OpenAI use policy."

Elsewhere, beyond specific platforms, numerous reports and studies have made clear that AI-generated content abounds throughout the web.

But while the English-language web is experiencing a steady — if palpable — AI creep, this new study suggests that the issue is far more pressing for many non-English speakers.

What's worse, the prevalence of AI-spun gibberish might make effectively training AI models in lower-resource languages nearly impossible in the long run.

To train an advanced LLM, AI scientists need large amounts of high-quality data, which they generally get by scraping the web.

If a given area of the internet is already overrun by nonsensical AI translations, the possibility of training advanced models in rarer languages could be stunted before it even starts.

The original article contains 465 words, the summary contains 169 words. Saved 64%. I'm a bot and I'm open source!