Reddit will block the Internet Archive
https://www.theverge.com/news/757538/reddit-internet-archive-wayback-machine-block-limitOpen linkView original on lemmy.world903
Comments159
https://www.theverge.com/news/757538/reddit-internet-archive-wayback-machine-block-limitOpen linkView original on lemmy.world
lol i think that might be the worst/best thing I have seen in a long time
Unrelated but is your username a play on benzene?
yes, in a way. this benzene ring
there was a band called Hum and in one of my favorite songs of theirs called The Scientists, the song talks about a couple who are scientists and creating and experimenting with drugs.
she tells him to keep this benzene ring around your finger, and think of me when everything you ever wanted is about to end
i fucking love that song but that moment in the song is just peak layers upon layers of music and poetry and love and adventure.
https://youtu.be/7IPDsUGBv64
Wow, what a memory trip! I listened to that song, I don't think I have heard it before, but it is great! I'm pretty sure I heard a different song from them at the time, but they probably live in my mind from looking at BMG and CD warehouse catalogs at the time. Other artists have popped up over the years from there.
I'm glad I asked, and thanks for answering! Somehow that took me back to my Candlebox days.
their big hit was called Stars
https://youtu.be/gMEB4HNNZ2I
oh yeah, I listened to Candelbox. They didn't put on a good live show sadly
Ya that was the one , i listened to it, also. The very beginning sounded very familiar but I'm not sure about the rest. But maybe it's been 30 years and, well 🤷♂️ I never saw Candlebox live, guess I didn't miss out. I really liked Alice in Chains but only got to see Cantrell tour while "waiting"
I never got to see AiC either 😩
Core memory unlocked! I remember catching a couple mix demo cds the 01 warped tour and Hum's stars was on it. I actually green album with the zebra on it shortly after.
fuck spez
Cuck boy getting pegged by post top op Garfield is definitely not something I had jotted down in my day-at-a-glance.
I would have at least expected him to ask Spez to put some lasagna on his bumhole as lube.
Art.
What a terrible day to have eyes.
Given that the Internet Archive is the de facto standard way to cite material as seen on a given date --- they're a trustworthy party that will probably persist for a long time --- that's going to make it harder to cite content on Reddit.
It's important for people writing papers and such who need to cite material.
I wonder if there's some way to use the TLS certificate to get a cryptographically-signed copy of a webpage with timestamp that someone could later validate as having been downloaded on that date. I don't know if existing TLS libraries are capable of that. Like, Web browser menu option "Store cryptographically-signed webpage". Absent a later certificate compromise, I'd think that that'd at least provide people a way to credibly say "this is really what was on that webpage on August 15th, 2026". Like, you'd have to save a copy of the TLS session and then have libraries that could read and validate an already-generated session. The timestamp is already embedded in the session.
Some protocols, like OTR, are designed to specifically not allow that, but AFAIK, TLS could.
EDIT: Well, technically the timestamp is gonna be during the handshake, not tied to the HTTP request internal to the TLS session. It might be possible to game that by establishing a TLS session, holding it open without activity, and issuing a request much later. I'd think that that'd potentially be disallowed by Web servers one way or another, since otherwise you could probably do a denial-of-service attack by holding open a lot of sessions for a long time.
EDIT2: Oh, wait, no, shouldn't be an issue, because the HTTP Date response header is gonna have a timestamp tied to the response.
Unfortunately, it'll be more than that, as that'll be saving the plaintext files transferred internal to the TLS connection. The information that would need to be saved will normally just be thrown out, as it'll be the TLS connection itself.
On second thought, though, I don't think that it'd be viable, since the way that something like this normally works is to just use (slow) public key encryption to transfer a symmetric session key and to then use (fast) symmetric encryption on the bulk data, and once you have a copy of the session key, you could forge whatever you want with it. This would only work if you were using asymmetric encryption to encrypt the data in the connection.
kagis
https://www.cloudflare.com/learning/ssl/what-is-a-session-key/
Yeah. Oh, well. It was a happy thought for a moment.
As somebody who often ends up using Reddit like Stackoverflow and in some cases needing the Internet Archive (IA) to find the original post after it’s been deleted or garbled, I think this is a wakeup call for those go to Reddit both to get technical help and to post it. More than ever, Reddit is becoming an unreliable place to find answers for old obscure issues and if they are going to lockout places like the IA then I think it’s time people stopped contributing their solutions to Reddit.
Searching anywhere in general is getting shittier and shittier by day. Web searches are riddled with hallucinated AI generated garbage pages. Finding the right answer for difficult problems is getting worse and worse. We are sliding rapidly into Idiocracy.
Not to mention so many projects putting their support in walled garden chat services like Discord that you can’t even search via search engine. Even if you can figure out who asked the right question and when, you have to trawl through a sea of inane garbled chat to get to the developer/expert response.
Specialised topic forums really need to make a resurgence but I doubt they will.
Seeing this happen has been one of the saddest most desperate parts about watching the internet dying.
It was obvious what was going to happen years ago, that didn't stop people from acting like I was a reactionary foolish cynic when I voiced concern about this though.
Seriously FUCK Discord (and Reddit).
Buddy, we are already there. “Ow, my balls!” Would be high-brow tv these days.
"Ow, my balls!" was already a thing in the 90s, on BIG time tv. It was called america's funniest videos.
Ah, back when it was “America’s Funniest Home Videos”. Yes, they pioneered the crotch-smashing format. I’m just saying, shit like Real Housewives makes getting hit in the balls look like Masterpiece Theatre.
The two things that just make me boggle, specifically about that, was just how filthy bob saget was (sort of like robin williams) in his comedy outside of the tv roles, and apparently how much straight up home-made porn was sent in to that show.
I wish his 2007 stand up 'that ain't right' was still around. He was fucking hilarious.
yup. continuing to feed them traffic after their repeated attacks on the userbase is just sad. stop using them. yeah it sucks the info is gone, but acting like they'll wake up and change is absurd.
When I joined Lemmy I decided it was unwise to trust anything on Reddit less than a year old. Now it's anything under two years old.
most of my technical questions about Linux are not even answered lol. So difficult to get good answers on reddit.
Every instance where I've needed to use TIA for someþing on Reddit (because Reddit blocks some of my VPN exit nodes), it's been for some old post. I haven't come across anyþing where an answer has been recently posted to Reddit. Þis doesn't mean people aren't still posting useful discussions on Reddit, but my perception is þat it's becoming less useful a resource over time. Maybe because þe knowledgeable people have mostly migrated off?
Ofttimes what I've looked up in TIA for Reddit was already cached. Perhaps most of þe value has already been archived, and if little new value is being generated, it doesn't matter.
Þe upshot is, I'm not sure how much effect þis will actually have.
exact same here. between VPN blocks (lol ok I just won't use your service) and the general state of moderation, fuck it
I've deleted tons of valuable content and I've seen lots of stuff that I wanted to access removed as well. it's annoying, but oh well. other forums will remain
Oh, me too! Scorched earþ, when I left. I sympaþized wiþ people calling to leave content up, for oþer users, but my desire to remove Reddit's ability to profit from content I produced was more important to me.
Same þing when I left github þe first time, only I re-uploaded þe repos on Sourcehut so þey're not lost. But I purged everyþing on github. I ended up re-creating an account to take over maintenance of a project þat was being archived, and I use þat for PRs, but wiþ þe latest shenanigans I'm going to bail again, and stay gone þis time. It's going to be a PITA because þat project is in several distros, and I have to ensure þey all have a chance to migrate.
It’s another move to protect against AI scraping that isn't paying them for access.
Weren't Reddit complaining a couple of years ago that too many AI bots crawls were stressing their servers.
Doesn't the internet archive relieve that stress?
I think that was probably the real reason for the block, the Internet Archive is too functional, scalable and accessible of a service for reddit's lame excuses about needing to gatekeep access to the community created content on their website to not make reddit look totally stupid unless they came up with an excuse to block the Internet Archive.
I already gave up from Reddit long time ago. Deleted all
When RIF died, Voyager became the new forum app for me.
Apollo and Voyager for me so I straight-up retained the same UI.
Is it better than summit? I'm on summit now and pretty happy with it for far. Never heard of voyager though
If you ever used Apollo for reddit, this is 99% the same. I haven't used summit so I can't compare unfortunately.
Yep. Voyager is way nicer. Good call friend
Maybe I should try voyager too
Thanks for sharing. I will check it out
Yup, same here.
this is the way.
As long as the previous collections of archives are still intact. We probably don’t need all of their new spam posts in the wayback machine anyway
It is my understanding that if you block the wayback machine from indexing your site it will also delist the history as well.
They do archive sites against the owners wishes when they consider it an important site for public archiving, like some news sites. They are in no obligation to delete the archives and hope they don’t.
Parties have archived the data from pushshift, which cover a lot of Reddit history.
kagis
https://academictorrents.com/details/1614740ac8c94505e4ecb9d88be8bed7b6afddd4
I mean, that won't have the past half year or some low-traffic subreddits, but...
The ability to block crawling is separate from the ability to delist old pages. The latter usually happens after domains change owners
LOL I should have scrolled down. You said what I said, with fewer words, first.
Just more vindication for my ditching that trash heap of a platform. YT is probably going to be the next platform I ditch as they're going full Reddit now.
It's a matter of time before third-party YT front-ends start getting throttled or outright blocked like third-party Reddit front-ends.
YouTube's already throttling users in their mobile site. They have these massive channel cards in their feeds and the video titles/thumbnails disappear after a few offerings, leaving you with the ability to blindly click on a video.
I've declared my YT channel to be dormant starting on the 13th due to this AI age-gating crap.
I wanna see if YouTube is that stupid they send my 18+ year old YT account an age verification check. April 2007 feels like a long time ago.....
Dumping YT / gMaps / Google SSO etc and replacing them bit by bit is a hard vice to break, but I've got others using self hosted shit now (yay Immich and Jellyseerr arr....) and I'll keep on doing it for others too.
Knowing the corrupt pricks that Google are, I wouldn't put that past them. The age-gating isn't even about protecting the kids, it's about censorship.
Time to use peertube
And Invidious while being logged out of YT while that's still an option, but I have both a PeerTube and Odysee set up already.
I seem to have the best luck with the inv.nadeko.net instance and to a lesser extent the invidious.nerdvpn.de instance, and both instances proxy by default.
People who posted on Reddit ( speaking in the past tense, because who would continue to do so now that we have better things? ) never intended for it to be of limited access. Reddit was a publicly accessible place, and people shared their thoughts and comments on it because it was the frontpage of the internet, so the place of choice to share things with the world. That being scraped should not be a problem. But clearly Reddit didn't want to give you a platform to share your thoughts with the world, they wanted you to donate your thoughts and take it as their property so that they can capitalize on it.
I don't know... I mean, I agree. But I'm seeing a lot of demands that instances should prevent scraping. Ok, it could be astroturf; a campaign by Reddit/data brokers to neutralize the free competition. But you have seen all those deleted posts on Reddit. Those are some special little minds.
you're right, there's probably some anti-ai/anti-scraping folks on there aswell as here. Personally I most definitely hate intellectual property more than I do generative AI. But you're right, different people on there will feel differently. But the point still stands that for those who thought they shared their thoughts with the world, their ideas that they donated were taken from them.
That place is becoming more and more of a shithole. Bots, Ads, trolls, garbage mods… deleted the app last month.
I quit reddit, cold turkey, the day they shut off free API access for 3rd parties. Except for a couple of fairly niche subs I haven't missed it at all.
Same here. I've been better off ever since.
This is huge blow to archivism, thanks to corporate greed and enshittification of reddit. Worst MBA filled POS.
Yeah, wouldn't want those AI companies to get all that data for free. Gotta make 'em pay for it.
Instead of regulating tech, they are going the fuck over everyone route.
Oh no, someone might not be paying them for their user generated content (!)
To be fair, it's probably best that history forgets this period of the web...
and thus it repeats
don't worry, we easily repeat what we "learned" anyway
That's the kind of talk that can get you banned from Reddit. 😜
Damn you Spez.
So reddit will become even less valuable
I am new to Lemmy, is there a fuckreddit sub?
In a way, the entire lemmy community is the fuckreddit sub
Why would you want to spend more time thinking about a dead site?
I just like to laugh at things I dislike. And I also like to see how bad it's getting. Iwas in the undelete sub and it was amazing.
Yes.
Hi welcome to Lemmy, we hate reddit here.
![email protected]
https://lemmyverse.net/
This is a great site to search for communities. Doesnt seem like there is one.
Yes
If you seek a pleasant public forum, look about you.
Good plan. Keep locking down your big tech platforms, and we'll all be over here letting folks know where they can find freedom.
Careful. Lemmy is too small to draw the attention of sophisticated, persistent abuse. As a company, Reddit has struggled with revenue and we've all seen those struggles quite publicly. Lemmy instances with those same challenges would probably just fold and close up.
Federated networks give you freedom but the potential for abuse is proportional to that freedom while at the same time, federation is far more expensive taken as a whole.
Can confirm. I set up a pixelfed instance for my city with the goal of moving people from Insta to this version. After about three months, user accounts went from 1-10 signups a week to a hundred a week.
No way did that many business owners sign up. And yep, all spam.
After a while, my random weekend project in Spring became a full time job. I closed it last month.
I've thought of doing something similar, and think, while the federated spam is hard to deal with, signup spam is manageable if you somehow restrict signups to the actual community you want to support. Open signup on the web is a nightmare.
For a city, an interesting idea might be to only allow signups on a dedicated, physical wifi AP placed somewhere strategic in your city. People would literally have to go to a physical location to sign up. Piggy-backing on a library system would be another option if you could somehow get them to buy-in.
I'm sure it would persist even after an event of malicious activity. It may just turn out like email with servers needing to be added to an allowlist at worst and more moderation. I think scalability might be the limiting factor at some point though and as a result we could end up with several disconnected islands of server clusters instead of globally meshed servers.
Or... let them stay on Reddit. I like lemmy much better, and it's possibly due to the people that are not present and the lack of commercial interest.
No harm in that. To each their own. :-) Everyone gets to decide at least.
I think if the fediverse was ever to become more mainstream, it would naturally splinter. For example, the corporate stuff would be big, and those people who value the small-instance experience we have now would probably de-federate from it. There would always be small fediverses, even if the big fediverses got REALLY big.
Does anyone have any good tech- related forums on Lemmy? I’m still digging around as i find a lot of interesting but “Quiet” ones.
Gate them in their own space.
Just make your own invite-only server if you're so worried about it. Digital freedom should be for everyone, not just a few antisocial nerds.
I'm not worried about anything.
Well, clearly you are, or you wouldn't suggest that most people should stay on (what I think we both agree to be) an inferiror platform that affords them fewer freedoms.
If you're worried that somehow that would bring unwanted attention or a bad crowd, you can always sequester yourself in a more niche server. That's the whole point of this federated system to begin with - giving us more control of our digital presence.
'freedom' as long as the mod agrees with you.
Fuck Reddit
Nice of them to protect their (users') content from AI scrapping. So that they can charge AI companies for it instead.
They aren’t doing that. They are protecting content from being scraped for free. Reddit is perfectly happy to charge for AI access to user-generated content.
No, that's not what's happening. They're preventing scrapers from accessing the content at no charge. They're totally willing to make deals for access to their content in exchange for money.
Almost, but they are really making it so they can charge ai companies for user data and not allow scrappers to get the data for free.
Fuck Reddit and Fuck Spez.
They can keep their shit for themselves, stopped caring a long time ago.
fucking reddit...
Time to just ignore them and scrape it anyways
OK, I stopped posting on Reddit but left my account and comments in place because I considered them part of the public record. If Reddit is taking that record private, it’s time for me to start removing my content from the platform.
Does anyone know if historical Reddit content will remain in IA? If not, I’m going to have to back up years of content somewhere else.
I'm assuming IA will continue to host their historical archives of Reddit, they'll just not have any new captures after this. Unless IA has said otherwise, it'd be very strange to wipe their archive of Reddit
Reddit is archived and available as torrent up until the API change.
You can't remove it. It's there forever.
Wrong.
You can request deletion of archived pages.
And you think reddit actually deletes it? Risk data loss? All that valuable data? No way. They might shadow delete it, but it's there forever.
both of you are correct because you are speaking of different things
There are some browser extensions that will edit your comments and make them each a random a bunch of random words. I do not know how effective they are so I cannot vouch for them.
I know that if you tried to just delete the comment, the information would still be there but the username is deleted. Which is frustrating, I didn't know that until I had already deleted every post and comment, went back to make sure the job was done. It wasn't. I just came to terms that at least I wasn't contributing to their hub of knowledge anymore.
In the lieu of an IPO u/spez has actively destroyed everything that made Reddit good! Gate keeping the API thinking it'll help with making some bigshot LLM some day lol
Lol every platform seems to live long enough to shoot themselves in the foot.
Enshittification
Phpbb/mybb/smf haven’t seemed to do that.
When reddit has mutated a few more times. They start erasing stuff themselves. It will be lost to time and that fills me with hope.
This company limited search crawlers to google, why are you surprised?
Fuck Spez
Is that even possible?
Technologically no. Reddit sends out the data to 10s of millions of users as part of their normal operations. They need to try to block those who collect that data for the IA. Reddit has the very short end of the stick.
The problem is that evading such counter-measures may be criminal in the US. Obviously, EU laws are much harsher.
Not to mention all of Asia, South America, Africa...
Slightly related, can you explain how (a few times for me) an archived page I tried to revisit got erased?
I don't know their take-down policy. Could be privacy, could be copyright.
I think they are shielded by Section 230 under US law. That means, if they don't do take-downs when requested, they become liable just like the original uploader. So it depends on whether they think they can defend something as fair use. IDK what they do with requests under non-US laws.
Thanks for your detailed explanation.
When I look that up it's specifically about 'defamatory, illegal, or harmful content'.
That would be understandable to take down.
Never encountered that myself, the cases I'm referring to were totally legal content AFAIK.
Only very damaging or proof of something.
As a hypothetical example, let's say an organisation posts it's associated with Epstein in 1999 which now obviously is very inconvenient.
They understandably remove it from their website but it should stil be on the archive if captured before.
However, in similar controversial real cases it wasn't.
So it appears certain forces have more influence to get them to remove content beyond what's legally required.
Since then I always screenshot the archive page.
Hmm. There are many things that could cause legal trouble for the Wayback Machine. I wouldn't jump to conclusions.
You can see on Lemmy that many people would prefer to outlaw scraping, fair use, and all that. Well, not for the "good guys" obviously, but the law doesn't work on vibes. The IA would be legally impossible in most countries. In the EU, it would be a major crime because of copyright and GDPR. It's only the traditional US commitment to free speech and fair use that makes it possible at all.
The IA exists in a legally precarious position. That's not because of any shady backroom dealing. If the crowd in this community had its way, it would be gone.
I know the EU has different (stricter) laws and that they vary between states. (Germany being particularly awful)
There is however some complicated form of fair use policy.
If the IA hosts music and books that might be problematic.
But I'm talking about archived webpages and information previously available to the public with zero commercial value that has been removed.
And this includes American sites.
It is still "intellectual property". Maybe the policy is to just oblige removal requests if the content doesn't seem to be of public interest. Cause why not, right? Look at all the people here on Lemmy angry that their worthless posts are scraped or deleting them on Reddit. Obliging takedown requests is certainly the path of least resistance.
Not that reddit isn't hot garbage right now, and has been for a while actually, but there's a lot of people here who have glazed over the reason why reddit instituted this policy.
AI companies are scraping the Wayback Machine. This is something that should concern all of us.
Why?
Circumventing sites with 'no ai scraping' rules
And what do I care about Reddit getting paid?
If the IA doesn't complain about being used, then it's fine for me. The ideal outcome would be, if the archive can make some arrangement where they scrape the data and provide it to everyone. That way, sites only get scraped once and not constantly hammered.
There are plenty of sites out there not owned by major conglomerates that have norobots and noscrape tags that AI companies can use Wayback as a way to circumvent their policies.
This isn't about reddit, it's about AI companies stealing everything on the internet and then selling it back to you while taking your job away.
This is why we can't have nice things. Tell you what. I will have as much support for you, as you have for blue collar workers. Sound fair?
Since I'm a union worker, sounds good.
Ahh, the next Ronald Reagan.
If you can't archive something, did it ever really exist?
In a causal sense, yes. In a 'the average person is fucking stupid' sense, no.
Reddit warned my account ( first warn in 10 years ) and deleted the comment when I told a American he can strike peacefully to show the government they are against it.
I got a warn for recommending violence by an ai , the human that checked it agreed and didn't remove the warn haha.
Reddit is just feared that their censorship goes public.
I was on Reddit for like 15 years, then got all my warnings and a ban in like a month or two earlier this year. Oh well, lol.
I just replied “Liar, or fucking liar.” To every republican lie I saw. Only took 2 days for a permaban. I feel if they can lie we should be able to call them out on it at least.
I was on reddit for 11 years before getting banned due to zionists. I have a throwaway reddit account now for porn and other shit, but I dont post.
AI can scrape books and journals for info, but can't scrape Reddit?
Reddit can be scraped just as much as online books and journals.
So what's the point of this?
Other sites, eg with books and journals, are doing the same thing. They hope that they can extract more money by reducing the availability of their content.
Yes. Rules for thee.
And I will block reddit.
what's a reddit?
You use it too scratch your butt I think.
Another nail in the coffin.
I stopped using reddit long ago.
That means big news is coming, and the media doesn't want to fuck up the reporting that is comming. Reddit preparing for mass submission of articles