Nearly all of Spotify has been scraped and is available via torrents
https://www.soundguys.com/nearly-all-of-spotify-has-been-scraped-and-is-available-via-torrents-150475/Open linkView original on lemmy.world1292
Comments199
https://www.soundguys.com/nearly-all-of-spotify-has-been-scraped-and-is-available-via-torrents-150475/Open linkView original on lemmy.world
That’s disgusting. Where would you find such torrents?
https://annas-archive.li/torrents/spotify
The album art torrent is a goldmine. Such a pain in the ass sometimes to find high quality album covers.
Have they actually been indexed?
There is 200gb of just metadata
4TB if you include all with popularity=0 iirc
What I used to do is google the name of the album and append to it "cover itunes". Usually I would find high quality images of the albums that way
The fact that annas archive exists despite how fucked up everything is right now gives me hope. Every time I start to feel cynical about the future I remind myself that there's people out there working to preserve the art and culture of our modern era with all the most powerful corporations and governments working against them, and they're succeeding.
Anyway, thanks for coming to my Ted Talk.
100%
...and Archive.org
And Wikipedia.
None of these are audio torrents.
That's not released yet.
Shit me, 200gb in metadata, 2.2TB in cover art before we even touch a piece of music! Wild
Its also crazy when you realise the amount of knowledge an experienced data analyst could gain from 200gb of metadata.
People are saying it's 300TB but this link is only 200GB why?
The 200GB is the metadata sqlite database only
God damn! That's essentially just text, right? Or would it also include album cover art?
Basically, the id3 tags for the music files. However, Spotify uses several more nonstandard tags in their database, some of them are great to make playlists.
It includes cover art and also preview clips. In this blog post you can read what their database contains: https://annas-archive.li/blog/backing-up-spotify.html
Not released yet
what a beautiful, simple and well designed website
"address not found"
Get better DNS
Nah, checked Wikipedia. Germany is one of the countries that entirely blocks the site.
Luckily I live in Denmark, and Denmark doesn't block the archive. So when I'm back home I'll certainly be checking it out.
Not all of Germany. There is this one for example, works like a charm:
https://digitalcourage.de/support/zensurfreier-dns-server
Mullvad.net
NextDNS.io
I would never bookmark such a site! [clicks bookmark button]
Download and seed seed seed
cue Padme
‘And avoid it?’ — ‘To avoid it, right?!’
Did not see this coming when I built my 40TB NAS
Get to acquiring Seagate external HDDs and shucking them for your own 3.5" drive bays before the data centers get them
Sadly my wallet is on time out
The 20TB drives I was looking at in July are up 40% :/
Pc guys haven’t had a break in like 7 years. One component or another is the hottest item for one scam or another.
Frankenstein all the way, you just have to continuously build from whatever part is cheap at the moment.
I'm building my first Linux setup and have a NAS planned out. I'm so stoked. I got a raspberry pi kit from my dad as an Xmas gift yesterday.
I also used a raspberry pi (5). People here will advise against it but for me it's been working fine so far. I can stream 4K with Jellyfin on my local network just fine. Read/write speeds aren't great but good enough for me. I used a Pi hat with 5x SATA ports and I have 5x 8TB HDDs in a custom 3D printed enclosure and I'm using ZFS RAID z1. No complaints yet.
You learn a lot more than if you were to just slap an epyc in a box. Pi will reach you about encoding and balancing resources. I still use everything I learnt and some of the gear like the terra master only reason I don't use it anymore is because I got free server stuff from work.
At 100 now and it looks like I need to quadruple
40TB LOL. Where we're going, we're gonna need more space! Back to the Space, Part IV!
My soon to be homeserver will have an astonishing 4TB of space. The CPU can handle up to 15TB (according to official specifications), but I am lacking drives that are big enough.
If I ever get the money to build a proper NAS I will 100% start going all in on Storage and start doing stupid shit like mirroring Wikipedia. I will probably not start mirroring the entirety of Spotify (which would be kinda sick NGL.), but I kind of have the problem that I Am kind of a data hoarder that likes to store excessive amounts of stuff I will never need. In the future I will also start seeding all the music I have, but I need a VPN with port forwarding for that beforehand (I'm currently kinda broke, so won't happen that fast)
As far as I’ve read, the database is largely low bitrate files, and some AI. The value here is metadata and preservation of “rare” music.
Nope, I would not call 160kbps Vorbis low bitrate, it’s roughly quality of 192kbps MP3. Only the ”popularity=0” stuff (so stuff with so few listens that Spotify does not keep record of) were re-encoded to 75kbps Opus, which as a modern codec is much better than it sounds like but of course re-encode is not great for already lossy stuff.
For purists there are those Tidal downloader sites available everywhere for free lossless music, even 24-bit hires FLAC.
Opus is what I’m encoding my working library to. I like ripping to flac (and archiving them as such), but the advantages to smaller file sizes for the working library are worth it for me. So far, I’m really liking the format.
I keep the archive on spinning hard drives, but the opus library on ssd (which makes browsing much quicker, and no unnecessary spinning up the hard drives.)
It's not lossless but current ogg vorbis at 160kbps is absolutely transparent for the vast majority of people. That's actually what I chose to keep my own collection, I mean, outside of the lossless albums that I absolutely want to flawlessly preserve.
How does it compare to 192 and 320 bps mp3?
Thanks.
I hear the diff between 192 & 320 (probably 256 is enough but better safe than sorry I thought so that's what I use) so 160 vorbis is very good but not totally perfect for me then.
I'd say in general it's the same, but way lighter than 320 kbps mp3. It's better than 192 kpbs mp3 and as good or better than 256 kbps mp3.
Am I losing my mind? All magnet links are metadata, no?
They havent released the music files yet
Oh my god did you guys just steal my music?! You each owe me $0.003 !!!
I got half a quesadilla and a shiny rock
You must be a very generous person, that's worth way more than what dude asked for
You barter with what you have
Is this new? Aren't most tracks already available in torrents?
Yep, most of tracks were already available on "various" sources, but this time they directly scraped the whole Spotify database.
It's really nice from them to backup Spotify database on a distributed system, and for free ! This ensure Spotify business won't be endanger in case of critical hardware failure.
So nice of them to help with Spotify's off-site backup.
It's new insofar as this is one big scrape. About 300TB iirc.
300tb is a lot, but its kind of crazy to think this entire company only needs 300tb storage arrays to function. I wonder how they handle things internally. I would imagine at least 1 backup server ready to go in HA. I wonder if they have multiple regions across the country that also serves up the same setup.
They need other 300TB to store all the ads.
"Are you an incel with few friends, no job, and a deep seated hate for melanin? COME JOIN ICE!"
And now back to the Bro Jogan Experience.
have never understood the fanbase, and have never understood people who love music throwing $$$$ at spotify so they could pay brogan $$$ and give actual musicians a pittance.
Likely cloned Netflix's "netflix in a box" design, where they drop a large 200TB+ NAS in thousands of different CDN datecenters with their most popular content cached so that total traffic is minimal across the internet at large.
Spotify mainly being music with very little video likely makes this even easier.
IIRC there's still like 700TB of low popularity music missing, but it is only something like 0.4% of listens.
And they need a more storage overall because they have to set up datecenters around the world - doesn't make sense to stream tens of millions of connections across the ocean. But that also gives all the backups one would need for "free".
Afaik 300 TB is just the most popular music and around a third of all tracks. The blog post on anna's is quite entertaining tho.
There are 245 TB ssd drives now. You can almost fit that in a single drive.
This isnt the entirety of Spotify. If they would have archived everything in 160kbps OGG Vorbis it would have been 700+TB. Theres A bucket load of songs that literally no one listens to.
Oh I know, I work in the industry as well. Our company backups alone for workstations and servers is just under 1 petabyte. This is then replicated to an offsite location which is also out disaster recovery location, and also stored in long term storage in Azure. This is just backups, sooo much money for backups haha. Thats why I am shocked that this entire company can run off of 300tb which is a lot, but nothing when you think of it being the entire business model for them.
I think the craziest thing ive seen is we have these instruments that do genome testing and sequencing and they would create like 10tb worth of data per month. Every month they got there own 10tb drive handed to them to backup their stuff on there own on top of the ones we did for them.
I worked with visualisation of scientific data, up to 1petabyte, multi channel 3D realtime visu without degradation. One client had 1.5TB ram. Interesting times.
Not mine, because I’m not famous enough for people to pirate my music lol. It would be flattering for me to be included in this batch of scraped music.
I'd steal your music
If your Spotify popularity is not 0, you probably are in the scraped archive.
My very long game of avoiding spotify is finally paying off
It would be awesome if we had an app that allowed to stream directly from such torrents, and had a user-made recommendation system to replace the discovery algorithm :D
Stremio + Torrentio does this for TV but I haven't found an equivalent for music. Hoping to be proven wrong 🤞
Something as easy as stremio but for music. Connect to listenbrainz instead of trakt. Then only serve from the spotify collection because of their extensive metadata. With multi device sign in and syncing like stremio. Then a Kodi add on for the libreelec people.
i need a subscribe button for this
Whoever knows the answer, or when one is developed, someone please ping us all in this thread? thanks mate
Chatgpt is recommending an IPFS Cluster but I suspect this doesn't solve the problem completely.
I wonder how they are splitting it up in different torrents and how many.
They are splitting it up right? 😁
Direct stream may be phase 2
Anna's the GOAT
Sounds more like the pirate queen.
Moved to Tidal. Reviews from some tech-site said "It's like Spotify 10 years ago." as if that's a bad thing.
Loving the FLAC quality and non-enshittification.
Tidal lost me immediately when it took 3 months for them to recognize my cancelation of my subscription.
Umm
Umm what? That I wasn't happy I got charged three months for a service I repeatedly cancelled?
I think the confusion is that you basically said "Tidal lost me when they lost me."
I canceled to shift my charge date originally. Then it became permanent
A customer can come back in the future. But probably not after this.
It sounds like they lost you in 3 months, not immediately.
I moved to qobus. It's not owned by untied states capital.
I love Qobus. I heard they also pay the artists more fairly. It's just unfortunate that some big artists are still missing. And from what I can see subscriptions only work via the Google Play store.
I subscribed on their web to skip Google tax.
And there’s some great quality CLI applications that let you rip source quality FLACs!
I tried Tidal but they have a glitch on their app where downloaded songs get stuck and playlists stop downloading. They’ve had a bug open about it for at least three years last I checked. It may sound niche, but I’m often traveling and use offline music regularly
That's nothing compared to my old Napster collection
Saw them live 3 times before that shit. Never listened to a single song since.
Well, at least the datahoarders can preserve some things when it all goes to shit...
90% of Spotify is trash. Much like Audible, it's just choking on AI generated content and similar worthless vanity projects
Datahoarders are going to go WILD over this
data hoarders already have everything in here and far more, and the web release versions are a lower priority. thinking of red.sh here
I don't think many data hoarders are sitting on the AI generated stuff 😁
Let's put it all on a Funkwhale server.
Sure, you set it up.
Dang. You called me out on my bullshit.
Well, I've set a funkwhale server up before and I'm not dealing with that bullshit again.
I suppose this explains why I keep having to move Funkwhale servers every few months because they keep shutting down (That and the usual "server costs outweighing the donation pot" issue).
I'll chip in with the first TB...
Now do Netflix, Prime, Paramount, HBO, Disney, Hulu and Apple and we're golden.
Would be a magical day the day copyright dies.
Already done. It's called Torrent Streaming and lets you stream-on-demand anything that exists as a torrent without having to torrent anything yourself.
A client that can stream these Spotify torrents with an interface that works like Spotify (low bar, I know) will be awesome, but also including a database to match songs to artists so users can send money directly to the artists they listen to will make it revolutionary.
A stremio for music would be amazing
When copyright dies so rich conglomerates can make money by monetizing out toys and collectibles and theme parks from smaller creators content without paying a dime back to them? Copyright is beneficial. Copyright is good. It needs reform, yes, but don't mistake that with the concept being bad.
You should NOT actively do something, for money. You should EXPECT it.
Otherwise you'll eventually try to maximize profits and get rid of everything that made whatever you did good in the first place.
Abolishing copyright exclusive and only works if you abolish money with it. Otherwise you're only benefiting the largest corporations, despite what you think, it won't be the small guy winning
I'd just reduce copyright periods. Right now they are ridiculously long. No one should hold rights from 1930s works.
I buy music on Bandcamp to support artists and then stream music via my own Plex server
Didn't know Plex could handle music libraries
Yeah it's awesome, you get to spend all of your time chasing stupid issues like it picking the dumbest artist name possible and using it for 30 albums because one artist was on two of those albums, or deciding that 50% of your artists should be Lastname, Firstname but the other 50% should be Firstname Lastname. Then half the time it will use its own metadata for cover art and the other half it'll use metadata in the files. Doesn't matter how meticulous you are with your music tags an whether or not you have musicbrainz'd all of them to be consistent, it still finds a way to screw things up pretty much reliably.
But it's all worth it because PlexAmp is surprisingly good once you've done the legwork and fixed all the stupid shit and I would much rather buy music directly and self-host my own music streaming system than pay spotify monthly.
Wow this is so revolutionary.
Never in the history of the internet has music been available for free.
Not this easily with accurate tags and art it hasn't.
They more often than torrents do have the wrong tags when its not English music. Took me way too many emails to google music and Spotify before they stopped screaming at me with ALL CAPS on one album and before removing the dots after album track names (1. Track Name), not to mention the ones named TRACK 1, etc.
Uhh get on a real site and it has been.
Help me out. Where?
I know, I read about this news, but nowhere did I find this torrent file that they are talking about, do you know where they sent it to what site, and if this is just for the sake of information, do I condemn piracy?🧐
I think you're looking for this page.
1st time in my life I get a
Error HTTP 451even had to look up https://en.wikipedia.org/wiki/HTTP_451 and I have to admit the reference to Ray Bradbury's 1953 dystopian novel Fahrenheit 451 is pretty neat.Consequently because the proxy link you provided is relying on CloudFlare which is itself blocking access I recommending link first to the original content https://annas-archive.org/torrents and optionally to mirrors, proxies, etc.
https://http.cat/
genius
"Condemn" or "condone"? While similar sounding, those two words have basically opposite meaning. Yay English!
Don't even get started on our homophones. XD
Dude it's 2025 you can't use that word anymore
How many full seeds are there? I mean how many could there be? Who has 300 Tb to throw at this?
On their torrent page it's explained more but it's broken up into many many torrents and you basically say how much space you're willing to host and it generates one with the least seeded "blob".
I don't really know how that would work on the back end but it seems technically impressive.
Wow, that is technically impressive, very cool!
Given some of the collections I've seen on private trackers I'd say there is going to be quite a few seeding this in its entirety.
You mean if they wipe their collection to make space. Private sites probably have better quality than Spotify.
No im saying they have so much storage I could see them having space to seed this.
Blog post about the backup
https://annas-archive.li/blog/backing-up-spotify.html
Interesting post. There was one artist with popularity = 100 but they didn’t mention who it was
Is it down for anyone else?
It's DNS-blocked e.g. in Germany (german source). Try it with a proper DNS resolver. E.g. 8.8.8.8 or 1.1.1.1
Idk I'm able to open the link.
I wish I could think anything positive about this, but I can't imagine anyone who actually cares about music needs or wants this. Instead it'll almost certainly be used as an illegal and unethical dataset to further train bullshit AI to make slop songs. As easy as it is for people to claim "preservation", I do have to question the motives of stuff like this...
Fuck AI. Support your favorite human artists.
I mean, it seems like the perfect avenue to replicate Stremio / Kodi but for music
I guess it's easier packaged in a torrent rather than individual downloads, but I do question why anyone would need this if all they were doing was training AI, as everything is already available on YouTube for free. You don't need to hack a company to get the audio. Now if you're a human trying to actually listen to the songs, obviously the Spotify torrent will sound much better, assuming the hack captured the higher quality audio streams from Spotify. All the youtube downloaders sound like crap because its like 128kbps m4a (at least on newpipe, though you can do 160kbps opus if you want)
Same it seems useless to me. The real value is knowing how songs relate to each other in terms of being played before/after other songs, and that's only available via internal datasets that they could never scrape anyway.
Unless...?
It's a good thing if you are smart enough to understand that AI isn't going away. Universal bought udio, the "legal" variant of the dataset will be used to train models, only they will be closed source, censored and come with a ToS that gives all the rights from the generated music to the record companies from the get go.
At least this gives open source a chance.
Yeah, the people who are racing to download it all want to use it for profit. AI companies, companies that run databases, etc.
Everybody starts somewhere. Few come out the gate being Depeche Mode. That doesn't mean it's not worth the struggle to get better.
I find that unlikely when considering the current trajectory of AI.
not sure why you want that much music most of it garbage. i would like some of the podcasts that people dont post anywhere else though. all hail the data hoaders.
Data/Culture preservation (imagine having something like this for Sumer or ancient Egypt).
Also you could just take the few things you like, not all of it.
And remember to support artists that you think deserve it (megacorps in between don't really deserve it).
I have 3gb of space to share. Are there 150 other people like me that want to do some distributed hosting? What technology can handle this?
Torrents manage this without any extra tech. Just grab the .torrent file, and only select the files you have space for. Download them then seed, and get your 150 friends to do different files. If I then go to download a file that's in your batch, the download will happen from your server (and whoever else is seeding these files), and if I go to download a file that's on one of your 150 others' server it'll download from there.
Yes, almost. But i want it to seem like i have all 400tb on my 3tb drive. I want the cache to handle downloads automatically and I want an even spread of files so that none get lost.
Torrents give me:
But they do not give me:
Guarantees that every file stays availableA way to say “this file must exist on N peers”Maybe I should be typing this into a LLM.
Edit: ChatGPT is suggesting an IPFS Cluster
Edit 2: If the torrents stays active I can remove some requirements
Ahh fair enough! Yeah, not sure about all that.
A very long time ago I remember there being FUSE filesystems that allow you to mount a .torrent like a network drive. I have no idea if these are still kicking though. Still doesn't provide all of your extra bullet points but might be a good start to the rabbit hole
Fuse looks like a good way to make the 3tb look like 400tb but it doesn't do the replication bit.
I'm commenting these ideas publicly because maybe others will join in. Thanks for playing along!
Fuck... Now that RAM prices are skyrocketing, we gonna see hoarders buy hundreds of TB of storage, leading to price hikes
The price for
restoredrefurbished HDDs has already gone up compared to a year or so agoRestored HDDs?
He probably means refurbished
Wasn’t all of this shit already available as torrents?
You could hand pluck grains of rice out of a field or just steal a 25lb bag.
Which would you prefer?
In case of this, definitely the grains.
Most of this rice is not worth stealing :D
When the 25lb bag is mostly weavels?
You of all people should know how to spell that word correctly.
deleting my account
No.
Not everything got torrented after music streamers came into prominence. (Though chances are pretty good you could rip an MP3 off Youtube for whatever you're looking for.)
Ripping Mp3s off YouTube Music Videos is without a doubt the worst method of piracy I've ever heard of.
Beat me to it, ya scalliwag.
That’s Captain have heard of it, mate.
Nope. Install metube on your PC. Give it the link to your YouTube music playlist. It will download the playlist into your local "Music" folder in the background. Install syncthing and it will push your music folder to your backups and phone.
To add new songs, simply save it on YouTube to your music playlist and it will be downloaded to your devices soon.
I think they're making a Pirates of the Caribbean joke.
And I am Guybrush Threepwood, mighty music pirate!
🦀🦀🦀🦀🦀🎊🎊🎉🎉🎊
Has anyone tried to self host this? Of course, hosting 300tb isn't practical, so any solution would need to download the metadata and songs on demand.
I think it's quite important from the perspective of media preservation. We basically have a snapshot of music from a time where it was mostly Human-Made.
I agree, but on the flip side this will 100% be used to train new music generation models lol...
Based
99% of the "original" Spotify's contents are crap or AI bloat. The rest was already on torrent before.
They break this down on their page, but while that's certainly true-ish for the last year or two the bulk of the collection is from before that.
BLESSED
But are the files high quality, or is it the same low bitrate garbage they send to their users ?
In between
It's not in MP3, it's Ogg vorbis and I don't have equipment and time to have an informed opinion about how good Ogg vorbis is at any given bit rate.
They claim usual people can't tell between what they do and better bit rates. I know I'm picky so I know I'll notice something.
Remembering t this is Anna's archive. The goal is preservation of human culture, not pirating.
ogg is better than mp3 at lower bit rates. As they increase in bit rate they become indistinguishable.
That said, this stuff is way too low quality to care about.
I like flac and I understand the human desire to get the best quality of everything. You can't tell the difference between 25mb flac and 2,5 mb 160 kbps opus though.
Alright, I will concede that Opus at 160kbps is really good. I always knew at low bitrates it was better than mp3 but I have not used it in awhile. I store everything in flac.
But if anyone else wants to compare: https://opus-bitrates.anthum.com/ lets you compare various bit rates of Opus. I think I hear a high end distortion in 160 that I don't hear in the 512, but it could be expectation.
I remember hearing something about Ogg vorbis being outmoded by some other Ogg? I don't know I've moved from mp3 to FLAC personally.
Yes, Ogg Vorbis's sucessor is Opus, which is amazing at low bitrates. That's why they used 75 kbit/s Opus for all file with
popularity = 0opposed to the original 160 kbit/s Ogg Vorbis for all files withpopularity > 0.In my experience the container format is kinda irrelevant, the important questions are : Is it lossy or lossless ? What's the bitrate ?
Having listened to Spotify and Apple Music Lossless, there is definitely a noticeable difference. I have blind tested myself with the help of family members and I can always tell which is which after like 30 seconds, and I have terrible ears (tinnitus since birth)
Different compressions under same bit rate can be dramatically different. My favourite was aac 20 years ago. I don't think MP3 evolved significantly in this time frame, I was never a fan.
Spotify has bitrate options
They're talking about the files Anna's archive is torrenting.
Was there really much content on it that wasn't already available in a torrent somewhere already?
I would be very surprised if it wasn’t, at best it’s stuff no one bothered with and somehow I expect that won’t get torrented much either.
internet explorer bro