Spyke
lemmy.world

I have all of wikipedia in a single 156 GB text file. in my .zshrc i have fastWikiLookup() { cat ~/wikipedia.txt | grep "$@" }

51
Badabinskireply
kbin.earth

If you want a free and massive performance optimization, remove the cat:

fastWikiLookup() { grep "$@" ~/wikipedia.txt }

Reading and piping 156 GB of data to another process every time you want to look something up is a somewhat nontrivial action. Grep can directly read the file, which should result in a pretty damn good speed up.

59

It was obvious and I was being a bit of a dummy this morning. Mea culpa.

13
bambooreply
lemmy.blahaj.zone

Considering the community, I think catting 156 GB to grep and calling it fastWikiLookup is a subtle joke about how absurd this is.

17
nescreply
lemmy.cafe

How fast is it, really? How do you differentiate between topics?

9
lemmy.world

the topic differentiation technology doesn't exist yet, so i just hit ctrl+c about a second after i hit enter

18

lol database engineers who’ve built very complex systems and ingest and query mechanisms across the world all the sudden got very mad at your comment and they’re not sure why

12
lemmy.world

When I have to poop, I scroll through the Wikipedia app, I'm done scrolling reels/short. The front page is well curated and entertaing and at least I don't have to deal with sloppy and fake content.

41
lemmy.world

There's a wiki app?!

Fun fact, if you search for a term but start with !w you will get the wiki page.

10
lemmy.zip

Seeious question - since Wikipedia doesn't serve ads, wouldn't a drop in direct traffic be a good thing for them? It would reduce their server costs and presumably the people who truly value Wikipedia (contributors and donors) would still use it for its intended purpose.

Ignoring the population as a whole getting dumber, which seems to be a side effect of everything these days.

29

My concern would be losing market share to other sources resulting in Wikipedia slowly being forgotten by future generations. Then the downstream of effects of fewer new contributions, and increasingly consolidated moderators.

Then not being at the forefront of the Internet hurts prestige that reduces donations. I agree with your point on ad revenue vs server costs, but I imagine a 20% reduction in traffic hurts their long strategy more than a 20% reduction on hosting costs.

23
dubvee.org

Technically, yeah. But with less people going to Wikipedia directly there would probably stand to be less chance of getting any new contributors. I'm not sure how the foundation gets all its money, but the more traffic they serve the more they can prove their relevance which might matter for funding

10

Maybe I'm cynical but the type to use just the AI overview and the type donating to Wikipedia probably doesn't have that much overlap

4

Wikipedia relies on people reading it and realising "Wait, that's wrong..." to fix inaccuracies.

Recently some PR company was caught taking money to whitewash the Wikipedia pages of their clients. The more people that are looking at the pages, the more likely it is that someone will realise they are being manipulated maliciously.

7
dubvee.org

Lol, "PatrickGPT" would be the easiest novelty chatbot to write, too.

10 INPUT "ASK ME ANYTHING>"
20 PRINT "[THINKING...]"
30 SLEEP 3
40 PRINT "THAT'S A STUPID QUESTION"
50 GOTO 10
22
cmacreply
lemmy.world

47 GB for text only, 111 GB with images (both only English articles)

13

I'm on Linux because I hate Windows and Microslop, not because I love Linux.

While I'm guilty of sometimes stopping at the Google summary, which I see in Startpage searches, I usually do click through because the summary often doesn't show what I want to know. Usually what I want to know isn't the primary information, but something more obscure, or some possible connection.

7

Your comment answered my question. Which was how often hasn't been updated. August 2025 was fairly recent but I'm betting a number of articles are out of date at this point in time. For general reference an offline copy is good and it helps to keep an archive. But I wouldn't use it on the regular for standard references.

2

You reached the end