if you could standardise a file format for a specific task what would you pick and why

One could say it is the standard comic for these kinds of discussions.

n00b001 reply

There are too many of these comics, I'll make one to be the true comic response and unite all the different competing standards

dingleberry reply

🪛

raubarno

Open Document Standard (.odt) for all documents. In all public institutions (it's already a NATO standard for documents).

Because the Microsoft Word ones (.doc, .docx) are unusable outside the Microsoft Office ecosystem. I feel outraged every time I need to edit .docx file because it breaks the layout easily. And some older .doc files cannot even work with Microsoft Word.

Actually, IMHO, there should be some better alternative to .odt as well. Something more out of a declarative/scripted fashion like LaTeX but still WYSIWYG. LaTeX (and XeTeX, for my use cases) is too messy for me to work with, especially when a package is Byzantine. And it can be non-reproducible if I share/reuse the same document somewhere else.

Something has to be made with document files.

110

schnurrito reply

Markdown, asciidoc, restructuredtext are kinda like simple alternatives to LaTeX

d_k_bo reply

There is also https://github.com/typst/typst/

monobot reply

It is unbelievable we do not have standard document format.

DigitalJacobin reply

There are now 3 competing standards.

What's messed up is that, technically, we do. Originally, OpenDocument was the ISO standard document format. But then, baffling everyone, Microsoft got the ISO to also have .docx as an ISO standard. So now we have 2 competing document standards, the second of which is simply worse.

The Ramen Dutchman reply

ttrpg.network

That's awful, we should design something that covers both use cases!

megane-kun reply

I was too young to use it in any serious context, but I kinda dig how WordPerfect does formatting. It is hidden by default, but you can show them and manipulate them as needed.

It might already be a thing, but I am imagining a LaTeX-based standard for document formatting would do well with a WYSIWYG editor that would hide the complexity by default, but is available for those who need to manipulate it.

raubarno reply

There are programs (LyX, TexMacs) that implement WYSIWYG for LaTeX, TexMacs is exceptionally good. I don't know about the standards, though.

Another problem with LaTeX and most of the other document formats is that they are so bloated and depend on many other tasks that it is hardly possible to embed the tool into a larger document. That's a bit of criticism for UNIX design philosophy, as well. And LaTeX code is especially hard to make portable.

There used to be a similar situation with PDFs, it was really hard to display a PDF embedded in application. Finally, Firefox pdf.js came in and solved that issue.

The only embedded and easy-to-implement standard that describes a 'document' is HTML, for now (with Javascript for scripting). Only that it's not aware of page layout. If only there's an extension standard that could make a HTML page into a document...

megane-kun reply

I was actually thinking of something like markdown or HTML forming the base of that standard. But it's almost impossible (is it?) to do page layout with either of them.

But yeah! What I was thinking when I mentioned a LaTeX-based standard is to have a base set of "modules" (for a lack of a better term) that everyone should have and that would guarantee interoperability. That it's possible to create a document with the exact layout one wants with just the base standard functionality. That things won't be broken when opening up a document in a different editor.

There could be additional modules to facilitate things, but nothing like the 90's proprietary IE tags. The way I'm imagining this is that the additional modules would work on the base modules, making things slightly easier but that they ultimately depend on the base functionality.

IDK, it's really an idea that probably won't work upon further investigation, but I just really like the idea of an open standard for documents based on LaTeX (kinda like how HTML has been for web pages), where you could work on it as a text file (with all the tags) if needed.

MonkderZweite reply

Finally, Firefox pdf.js came in and solved that issue.

Which uses a bloated and convoluted scripting format specialized on manipulating html.

DigitalJacobin reply

True, but it offered a much more secure alternative to opening up PDFs locally.

MonkderZweite reply

I don't think so. pdf.js has all few monts a new XSS CVE, which is a web thing only. And if you use anything other than Adobe Reader/Acrobat...

erogenouswarzone reply

Bro, trying to give padding in Ms word, when you know... YOU KNOOOOW... they can convert to html. It drives me up the wall.

And don't get me started on excel.

Kill em all, I say.

DigitalJacobin

This is the kind of thing i think about all the time so i have a few.

Archive files: .tar.zst
- Produces better compression ratios than the DEFLATE compression algorithm (used by .zip and gzip/.gz) and does so faster.
- By separating the jobs of archiving (.tar), compressing (.zst), and (if you so choose) encrypting (.gpg), .tar.zst follows the Unix philosophy of "Make each program do one thing well.".
- .tar.xz is also very good and seems more popular (probably since it was released 6 years earlier in 2009), but, when tuned to it's maximum compression level, .tar.zst can achieve a compression ratio pretty close to LZMA (used by .tar.xz and .7z) and do it faster¹.
  
  zstd and xz trade blows in their compression ratio. Recompressing all packages to zstd with our options yields a total ~0.8% increase in package size on all of our packages combined, but the decompression time for all packages saw a ~1300% speedup.
Image files: JPEG XL/.jxl
- "Why JPEG XL"
- Free and open format.
- Can handle lossy images, lossless images, images with transparency, images with layers, and animated images, giving it the potential of being a universal image format.
- Much better quality and compression efficiency than current lossy and lossless image formats (.jpeg, .png, .gif).
- Produces much smaller files for lossless images than AVIF²
- Supports much larger resolutions than AVIF's 9-megapixel limit (important for lossless images).
- Supports up to 24-bit color depth, much more than AVIF's 12-bit color depth limit (which, to be fair, is probably good enough).
Videos (Codec): AV1
- Free and open format.
- Much more efficient than x264 (used by .mp4) and VP9³.
Documents: OpenDocument / ODF / .odt
- @[email protected] says it best here. .odt is simply a better standard than .docx.
it’s already a NATO standard for documents Because the Microsoft Word ones (.doc, .docx) are unusable outside the Microsoft Office ecosystem. I feel outraged every time I need to edit .docx file because it breaks the layout easily. And some older .doc files cannot even work with Microsoft Word.

Footnotes

lloram239 reply

tal reply

https://github.com/vasi/pixz

.tar is pretty bad as it lacks in index, making it impossible to quickly seek around in the file.

.tar.pixz/.tpxz has an index and uses LZMA and permits for parallel compression/decompression (increasingly-important on modern processors).

It's packaged in Debian, and I assume other Linux distros.

Only downside is that GNU tar doesn't have a single-letter shortcut to use pixz as a compressor, the way it does "z" for gzip, "j" for bzip2, or "J" for xz (LZMA); gotta use the more-verbose "-Ipixz".

Also, while I don't recommend it, IIRC gzip has a limited range that the effects of compression can propagate, and so even if you aren't intentionally trying to provide random access, there is software that leverages this to hack in random access as well. I don't recall whether someone has rigged it up with tar and indexing, but I suppose if someone were specifically determined to use gzip, one could go that route.

jackpot reply

By separating the jobs of archiving (.tar), compressing (.zst), and (if you so choose) encrypting (.gpg), .tar.zst follows the Unix philosophy of "Make each program do one thing well.".

wait so does it do all of those things?

DigitalJacobin reply

So there's a tool called tar that creates an archive (a .tar file. Then theres a tool called zstd that can be used to compress files, including .tar files, which then becomes a .tar.zst file. And then you can encrypt your .tar.zst file using a tool called gpg, which would leave you with an encrypted, compressed .tar.zst.gpg archive.

Now, most people aren't doing everything in the terminal, so the process for most people would be pretty much the same as creating a ZIP archive.

Laser reply

By separating the jobs of archiving (.tar), compressing (.zst), and (if you so choose) encrypting (.gpg), .tar.zst follows the Unix philosophy of “Make each program do one thing well.”.

The problem here being that GnuPG does nothing really well.

Videos (Codec): AV1

Much more efficient than x264 (used by .mp4) and VP9[3].

AV1 is also much younger than H264 (AV1 is a specification, x264 is an implementation), and only recently have software-encoders become somewhat viable; a more apt comparison would have been AV1 to HEVC, though the latter is also somewhat old nowadays but still a competitive codec. Unfortunately currently there aren't many options to use AV1 in a very meaningful way; you can encode your own media with it, but that's about it; you can stream to YouTube, but YouTube will recode to another codec.

DigitalJacobin reply

The problem here being that GnuPG does nothing really well.

Could you elaborate? I've never had any issues with gpg before and curious what people are having issues with.

Unfortunately currently there aren’t many options to use AV1 in a very meaningful way; you can encode your own media with it, but that’s about it; you can stream to YouTube, but YouTube will recode to another codec.

AV1 has almost full browser support (iirc) and companies like YouTube, Netflix, and Meta have started moving over to AV1 from VP9 (since AV1 is the successor to VP9). But you're right, it's still working on adoption, but this is moreso just my dreamworld than it is a prediction for future standardization.

Laser reply

Could you elaborate? I’ve never had any issues with gpg before and curious what people are having issues with.

This article and the blog post linked within it summarize it very well.

tal reply

Encrypting Email

Don’t. Email is insecure . Even with PGP, it’s default-plaintext, which means that even if you do everything right, some totally reasonable person you mail, doing totally reasonable things, will invariably CC the quoted plaintext of your encrypted message to someone else

Okay, provide me with an open standard that is widely-used that provides similar functionality.

It isn't there. There are parties who would like to move email users into their own little proprietary walled gardens, but not a replacement for email.

The guy is literally saying that encrypting email is unacceptable because it hasn't been built from the ground up to support encryption.

I mean, the PGP guys added PGP to an existing system because otherwise nobody would use their nifty new system. Hell, it's hard enough to get people to use PGP as it is. Saying "well, if everyone in the world just adopted a similar-but-new system that is more-amenable to encryption, that would be helpful", sure, but people aren't going to do that.

Laser reply

The message to be taken from here is rather "don't bother", if you need secure communication use something else, if you're just using it so that Google can't read your mail it might be ok but don't expect this solution to be secure or anything. It's security theater for the reasons listed, but the threat model for some people is a powerful adversary who can spend millions on software to find something against you in your communication and controls at least a significant portion of the infrastructure your data travels through. Think about whistleblowers in oppressive regimes, it's absolutely crucial there that no information at all leaks. There's just no way to safely rely on mail + PGP for secure communication there, and if you're fine with your secrets leaking at one point or another, you didn't really need that felt security in the first place. But then again, you're just doing what the blog calls LARPing in the first place.

DigitalJacobin reply

MonkderZweite reply

.odt is simply a better standard than .docx.

No surprise, since OOXML is barely even a standard.

jackpot reply

is av1 lossy

DigitalJacobin reply

AV1 can do lossy video as well as lossless video.

piexil reply

I get better compression ratio with xz than zstd, both at highest. When building an Ubuntu squashFS

Zstd is way faster though

jackpot reply

wait im confusrd whats the differenc ebetween .tar.zst and .tar.xz

DigitalJacobin reply

Different ways of compressing the initial .tar archive.

kadu reply

-19

Longpork_afficianado reply

lemmy.nz

But it's not a tarxz, it's an xz containing a tar, and you perform operations from right to left until you arrive back at the original files with whatever extensions they use.

If I compress an exe into a zip, would you expect that to be an exezip? No, you expect it to be file.exe.zip, informing you(and your system) that this file should first be unzipped, and then should be executed.

kadu reply

-3

7eter reply

Dots in filenames are commonly used in any operating system like name_version.2.4.5.exe or similar... So I don't see a problem.

kadu reply

-1

kraniax reply

lemmy.wtf

use a real operative system then

DigitalJacobin reply

Sounds like a Windows problem

kadu reply

DigitalJacobin reply

I get the frustration, but Windows is the one that strayed from convention/standard.

Also, i should've asked this earlier, but doesn't Windows also only look at the characters following the last dot in the filename when determining the file type? If so, then this should be fine for Windows, since there's only one canonical file extension at a time, right?

kadu reply

Gamma reply

I get your point. Since a .tar.zst file can be handled natively by tar, using .tzst instead does make sense.

Spore reply

There already are conventional abbreviations: see Section 2.1. I doubt they will be better supported by tools though.

kadu reply

-3

jaaval reply

In this case it really seems this windows convention is bad though. It is uninformative. And abbreviations mandate understanding more file extensions for no good reason. And I say this as primarily a windows user. Hiding file extensions was always a bad idea. It tries to make a simple reduced UI in a place where simple UI is not desirable. If you want a lean UI you should not be handling files directly in the first place.

Example.zip from the other comment is not a compressed .exe file, it's a compressed archive containing the exe file and some metadata. Windows standard tools would be in real trouble trying to understand unarchived compressed files many programs might want to use for logging or other data dumps. And that means a lot of software use their own custom extensions that neither the system nor the user knows what to do with without the original software. Using standard system tools and conventions is generally preferable.

sebsch reply

I would argue what windows does with the extensions is a bad idea. Why do you think engineers should do things in favour of these horrible decisions the most insecure OS is designed with?

ronweasleysl reply

Damn didn't realize that JXL was such a big deal. That whole JPEG recompression actually seems pretty damn cool as well. There was some noise about GNOME starting to make use of JXL in their ecosystem too...

Björn

swg-empire.de

zip or 7z for compressed archives. I hate that for some reason rar has become the defacto standard for piracy. It's just so bad.

The other day I saw a tar.gz containing a multipart-rar which contained an iso which contained a compressed bin file with an exe to decompress it. Soooo unnecessary.

Edit: And the decompressed game of course has all of its compressed assets in renamed zip files.

Aqarius reply

A .tarducken, if you will.

notfromhere reply

Ziptarar?

Bye reply

It was originally rar because it’s so easy to separate into multiple files. Now you can do that in other formats, but the legacy has stuck.

aksdb reply

Not just that. RAR also has recovery records.

seaQueue reply

.tar.zstd all the way IMO. I've almost entirely switched to archiving with zstd, it's a fantastic format.

xinayder reply

infosec.pub

why not gzip?

raubarno reply

Gzip is slower and outputs larger compression ratio. Zstandard, on the other hand, is terribly faster than any of the existing standards in means of compression speed, this is its killer feature. Also, it provides a bit better compression ratio than gzip ^citation_needed^.

Supermariofan67 reply

Yes, all compression levels of gzip have some zstd compression level that is both faster and better in compression ratio.

Additionally, the highest compression levels of zstd are comparable in compression level to LZMA while also being slightly faster in compression and many many times faster in decompression

seaQueue reply

gzip is very slow compared to zstd for similar levels of compression.

The zstd algorithm is a project by the same author as lz4. lz4 was designed for decompression speed, zstd was designed to balance resource utilization, speed and compression ratio and it does a fantastic job of it.

Turun reply

The only annoying thing is that the extension for zstd compression is zst (no d). Tar does not recognize a zstd extension, only zst is automatically recognized and decompressed. Come on!

seaQueue reply

If we're being entirely honest just about everything in the zstd ecosystem needs some basic UX love. Working with .tar.zst files in any GUI is an exercise in frustration as well.

I think they recently implemented support for chunked decoding so reading files inside a zstd archive (like, say, seeking to read inside tar files) should start to improve sooner or later but some of the niceties we expect from compressed archives aren't entirely there yet.

Fantastic compression though!

MonkderZweite reply

-I option?

Turun reply

Not sure what that does.

Yes, you can use options to specify exactly what you want. But it should recognize .zstd as zstandard compression instead of going "I don't know what this compression is". I don't want to have to specify the obvious extension just because I typed zstd instead of zst when creating the file.

KSP Atlas reply

.tar.xz masterrace

d_k_bo reply

This comment didn't age well.

Infernal_pizza

Literally any file format except PDF for documents that need to be edited. Fuck Adobe and fuck Acrobat

ElectricMachman reply

Isn't the point of PDF that it can't (or, perhaps more accurately, shouldn't) be edited after the fact? It's supposed to be immutable.

tal reply

Unless you have explicitly digitally-signed the PDF, it's not immutable. It's maybe more-annoying to modify, but one shouldn't rely on that.

And there are ways to digitally-sign everything, though not all viewing software has incorporated signature verification.

Infernal_pizza reply

I’m not sure if they were ever designed to be immutable, but that’s what a lot of people use it for because it’s harder to edit them. But there are programs that can edit PDFs. The main issue is I’m not aware of any free ones, and a lot of the alternatives don’t work as well as Adobe Acrobat which I hate! It’s always annoying at work when someone gets sent a document that they’re expected to edit and they don’t have an Acrobat license!

danilolc reply

lemmy.eco.br

I've already edited some pdfs with LibreOffice writer. I don't know if it's suitable for that, but it worked for me

tobbue reply

PDFs can contain a vast amount of different Image information, but often a good software that can edit vector data opens PDFs for editing easily. It might convert not embedded Fonts in paths and rasterize some transparency effects though. So Inkscape might work.

Infernal_pizza reply

I’m assuming that will work similar to Microsoft Word where it’s fine for basic PDFs but if there are a lot of tables or images it can mess up the document?

DogMuffins reply

think of it as though pdf is the container - it can contain all sorts of different data. I'd say you got real lucky being able to edit one with Writer without issues.

danilolc reply

lemmy.eco.br

I've confused the name, It was LibreOffice Draw, not Writer

Natanael reply

slrpnk.net

No, it's too preserve formatting when distributed. Editing is absolutely possible, always were, it's just annoying to parse the structure when trying to preserve the format as you make changes

DogMuffins reply

No, although there's probably a culture or convention around that.

Originally the idea was that it's a format which can contain fonts and other things so it will be rendered the same way on different devices even if those devices don't have those fonts installed. The only reason it's not commonly editable that I'm aware of is that it's a fairly arcane proprietary spec.

Now we have the openspec odt which can embed all the things, so pdf editing just doesn't really seem to have any support.

The established conventions around pdfs do kind of amaze me. Like contracts get emailed for printing & signing all the time. In many cases it would be trivial to edit the pdf and return your edited copy which the author is unlikely to ever read.

WetBeardHairs reply

Hold on. I'm applying for a mortgage and I want the bank to pay off my loan for me after 6 months of payments.

SnowdenHeroOfOurTime reply

unilem.org

Why would you use acrobat? I haven't used it in many years and use PDFs all the time

Infernal_pizza reply

What do you use?

SnowdenHeroOfOurTime reply

unilem.org

Depends on the platform I'm on. There are so many options. SumatraPDF on windows, whatever default app pop os has, preview on Mac, builtin android PDF viewer. I assume you're on windows because you mentioned acrobat. There are several options beside sumatra. I think many are decent.

Infernal_pizza reply

Ah I was more looking for alternative editors rather than viewers, I usually just use my web browser to view them

SnowdenHeroOfOurTime reply

unilem.org

Ah, yeah I normally would only need to do that in the context of signing a contract, which I do using Gimp or Photoshop.

Have you tried these? https://www.lifewire.com/best-free-pdf-editors-4147622

Infernal_pizza reply

I have not, I’ll give some of them a try!

piexil reply

Firefox can edit PDFs , although I wouldn't be surprised if it's not in depth

iegod reply

Is foxit still around? I didn't mind that one on windows.

joel_feila reply

Yup an it also one of best for linux

Phoenixz reply

lemmy.ca

Okular, but that's Linux

DogMuffins reply

Acrobat Reader is actually great for filling out forms.

Even if the "pdf" is actually just a potato quality photo of what was at some time a form, you can still fill it out in Acrobat Reader.

Generally in windows I prefer sumatra pdf as a reader, but I keep acrobat around for this purpose.

Supermariofan67

Ogg Opus for all lossy audio compression (mp3 needs to die)

7z or tar.zst for general purpose compression (zip and rar need to die)

dinckelman reply

Rust Buckett reply

mastodon.social

@dinckelman @Supermariofan67 I think you mean unsecure. It doesn't feel unsure of itself. 😁

-5

hungprocess reply

https://www.thefreedictionary.com/insecure

in·se·cure (ĭn′sĭ-kyo͝or′) adj.

Inadequately guarded or protected; unsafe: A shortage of military police made the air base insecure.

Unsecure

a. 1. Insecure.

https://www.thefreedictionary.com/Unsecure

Rust Buckett reply

mastodon.social

@hungprocess touché.

tal reply

@hungprocess Also this. https://english.stackexchange.com/questions/19653/insecure-or-unsecure-when-dealing-with-security

One thing I didn't appreciate about English until reading a Europe forum for a while is that it has a lot of different prefixes that mean something like "not", and this is not very intuitive to people learning the language. Their use is not regular.

Consider:

"a-" as in "atypical"
"non-" as in "nonconsentual"
"un-" as in "uncooperative"
"im-" as in "immortal"
"in-" as in "inconsiderate"
"il-" as in "illegitimate"
"mal-" as in "maladjusted"
"anti-" as in "anti-establishment"
"de-" as in "deconstruct"

And sometimes, some of the prefixes are associated with base words to form real words with similar meanings, but meanings that are not the same. For example, "immoral" and "amoral" do not mean the same thing, though they have related meanings.

Rust Buckett reply

mastodon.social

It seems that I was quite wrong, but that a lot of other people are wrong as well. lol

jackpot reply

why does zip and rar need to die

Supermariofan67 reply

Zip has terrible compression ratio compared to modern formats, it's also a mess of different partially incompatible implementations by different software, and also doesn't enforce utf8 or any standard for that matter for filenames, leading to garbled names when extracting old files. Its encryption is vulnerable to a known-plaintext attack and its key-derivation function is very easy to brute force.

Rar is proprietary. That alone is reason enough not to use it. It's also very slow.

PlexSheep reply

How about tar.gz? How does gzip compare to zstd?

Supermariofan67 reply

Both slower and worse at compression at all its levels.

Aatube reply

kbin.social

What’s wrong with mp3

Knusper reply

Big file size for rather bad audio quality.

folkrav reply

Supermariofan67 reply

People are able to on some songs because mp3 is poorly optimized for certain sounds, especially cymbals. However, opus can achieve better quality than that at 128k with fewer outliers than mp3 at 320k, which saves a lot of space.

folkrav reply

Knusper reply

We're not talking lossless. The comment above specified Opus-encoded OGG, which is lossy.

For example, I converted my music library from MP3 to OGG Opus and the size shrank from 16 GB to just 3 GB.

And if converting from lossless to both MP3 and OGG Opus, then OGG does sound quite a bit better at smaller file sizes.

So, the argument here is that musicians are underselling their art by primarily offering MP3 downloads. If the whole industry would just magically switch to OGG Opus, that would be quite an improvement for everyone involved.

folkrav reply

Knusper reply

Well, I understood this post to mean, if you had a wish, what would you wish for? Not necessarily that it's realistic...

I do agree with your points. Although, I can't help but feel like more people would prefer local files, if those actually sounded better than the bandwidth-limited streaming services.

folkrav reply

tal reply

I think that people overstate MP3's losses, and I agree at 320k that it's inaudible, but I can or at least have been able to tell at 128k, mostly with cymbals. Granted, cymbals aren't that common, but it's nice to not have them sound muddy. And, honestly, there just isn't a lot of reason to use MP3 for anything compressed today, other than maybe hardware decoding on very small devices and widespread support. There are open standards that are better.

folkrav reply

jackpot reply

why does ml3 need todie

Supermariofan67 reply

It's a 30 year old format, and large amounts of research and innovation in lossy audio compression have occurred since then. Opus can achieve better quality in like 40% the bitrate. Also, the format is, much like zip, a mess of partially broken implementations in the early days (although now everyone uses LAME so not as big of a deal). Its container/stream format is very messy too. Also no native tag format so it needs ID3 tags which don't enforce any standardized text encoding.

drwankingstein reply

its worth noting that aac is actually pretty good in a lot of cases too

Supermariofan67 reply

However, it is very patent encumbered and therefore wouldn't make for a good standard.

drwankingstein reply

aac lc and he-aac are both free now hev2 and xhe aren't, but those have more limited use

MonkderZweite reply

How about xz compared to zstd?

Supermariofan67 reply

At both algorithms' highest levels, xz seems to be on average a few percent better at compression ratio, but zstd is a bit faster at compression and much much faster at decompression. So if your goal is to compress as much as possible without regard to speed at all, xz -9 is better, but if you want compression that is almost as good but faster, zstd --long -19 is the way to go

At the lower compression presets, zstd is both faster and compresses better

d_k_bo

https://jpegxl.info/why-jxl.html

JPEG-XL for rasterized images.

GamingChairModel reply

I agree.

I especially love that it addresses the biggest pitfall of the typical "fancy new format does things better than the one we're already using" transition, in that it's specifically engineered to make migration easier, by allowing a lossless conversion from the dominant format.

hikaru755 reply

Never heard of that, thanks for bringing it to my attention!

kadu reply

d_k_bo reply

GNOME introduced its support in version 45, AFAIK there isn't a stable distro release yet that ships it.

DigitalJacobin reply

Unfortunately, adoption has been slow and Alliance for Open Media are pushing back somewhat (especially Google¹, who leads the group) in favor of their inferior .avif format.

Footnotes

https://www.phoronix.com/news/Chrome-Drops-JPEG-XL ↩

RoyaltyInTraining reply

How does it compare to AVIF?

d_k_bo reply

AVIF is slower, has a way smaller maximum resolution and doesn't support progressive decoding as well as lossless JPEG recompression.

RoyaltyInTraining reply

Oh dam, that resolution limit is a total deal breaker. Can't believe anyone would release a format with those limitations today...

IsoKiero

I don't know what to pick, but something else than PDF for the task of transferring documents between multiple systems. And yes, I know, PDF has it's strengths and there's a reason why it's so widely used, but it doesn't mean I have to like it.

Additionally all proprietary formats, specially ones who have gained enough users so that they're treated like a standard or requirement if you want to work with X.

kkard2 reply

oh it's x, not x... i hate our timeline

StarkillerX42 reply

I would be fine with PDFs exactly the same except Adobe doesn't exist and neither does Acrobat.

darklamer reply

StarkillerX42 reply

I would be fine with PDFs exactly the same except Adobe doesn't exist and neither does Acrobat.

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍

midwest.social

Resume information. There have been several attempts, but none have become an accepted standard.

When I was a consultant, this was the one standard I longed for the most. A data file where I could put all of my information, and then filter and format it for each application. But ultimately, I wanted to be able to submit the information in a standardised format - without having to re-enter it endlessly into crappy web forms.

I think things have gotten better today, but at the cost of a reliance on a monopoly (LinkedIn). And I'm not still in that sort of job market. But I think that desire was so strong it'll last me until I'm in my grave.

jackpot reply

https://sopuli.xyz/comment/3385727

sunbeam60

SQLite for all “I’m going to write my own binary format because I is haxor” jobs.

There are some specific cases where SQLite isn’t appropriate (streaming). But broadly it fits in 99% of cases.

seaQueue reply

To chase this - converting to json or another standardized format in every single case where someone is tempted to write their own custom parser. Never write custom parsers kids, they're an absolutely horrible time-suck and you'll be fixing them quite literally forever as you discover new and interesting ways for your input data to break them.

Edit: it doesn't have to be json, I really don't care what format you use, just pick an existing data format that uses a robust, thoroughly tested, parser.

taladar reply

To add to that. Configuration file formats...just pick a standard one, do not write your own.

And while we are at it, if there is even a remote chance that you have a "we will do everything declaratively" idea, just use an existing programming language for your file format instead of painfully growing some home-grown add-ons to your declarative format over the next decade or two because you were wrong about only needing a declarative format.

Spore reply

Also parquet if the data aren't mutated much.

jackpot reply

give me a category please

sunbeam60 reply

I’ll take “what’s that file format for $300 please”

MonkderZweite reply

Yeah, what was it? If office formats used sqlite instead of zip?

neomis

Data output from manufacturing equipment. Just pick a standard. JSON works. TOML / YAML if you need to write as you go. Stop creating your own format that’s 80% JSON anyways.

chunkystyles reply

JSON is nicer for some things, and YAML is nicer for others. It'd be nice if more apps would let you use whichever you prefer. The data can be represented in either, so let me choose.

freijon reply

KDL enters the chat

seaQueue

I'd like an update to the epub ebook format that leverages zstd compression and jpeg-xl. You'd see much better decompression performance (especially for very large books,) smaller file sizes and/or better image quality. I've been toying with the idea of implementing this as a .zpub book format and plugin for KOReader but haven't written any code for it yet.

Err(()).unwrap()

~~XML for machine-readable data because I live to cause chaos~~

Either markdown or Org for human-readable text-only documents. MS Office formats and the way they are handled have been a mess since the 2007 -x versions were introduced, and those and Open Document formats are way too bloated for when you only want to share a presentable text file.

While we're at it, standardize the fucking markdown syntax! I still have nightmares about Reddit's degenerate four-space-indent code blocks.

Agentseed reply

artemis.camp

Man, I'd love if markdown was more widely used, it's pretty much the perfect format for everything I do

raubarno reply

Markdown, CommonMark, .rst formats are good for printing basic rich text for technical documentation and so on, when text styling is made by an external application and you don't care about reproducible layout.

But you also want to make custom styles (font size, text alignment, colours), page layout (paper format, margin size, etc.) and make sure your document is reproducible across multiple processing applications, that the layout doesn't break, authoring tools, maybe even some version control, etc. This is when it strikes you bad.

PlexSheep reply

Markdown misses checkboxes anywhere, especially in tables.

But markdown is just good. It's just writing text as normal basically

tal reply

You can convert Markdown to a number of formats with pandoc, if you want to author in Markdown and just distribute in some other format.

Not going to work if you need to collaborate with other people, though.

lloram239

DraughtGlobe reply

feddit.nl

https://xkcd.com/927/

mexicancartel reply

Epub isn't supported by browsers

So you want EPUB support in browser and you have the ultimate document file format?

lloram239 reply

kshade reply

Weasyprint kinda is that, except that it's meant to be rendered to PDF.

mexicancartel reply

Can you explain why you need browser support for epub?

HaggierRapscallier reply

feddit.nl

EPubs are just websites bound in xhtml or something. Could we just not make every browser also an epub reader? (I just like epubs).

flying_sheep reply

They're basically zip files with a standardized metadata file to determine chapter order, index page, … and every chapter is a html file.

lloram239 reply

HaggierRapscallier reply

feddit.nl

Microsoft Edge's ePub reader was so good! I would have used it all the time for reading if it hadn't met its demise. Is there no equivalent fork or project out there? The existing epub readers always have these quirks that annoy me to the point where I'll just use Calibre's built in reader which works well enough.

shotgun_crab

TOML for configuration files

the_crab_man reply

100% this. Much more readable than JSON, YAML or other custom formats.

SunRed reply

I am surprised no one mentioned HCL yet. It's just as sane as toml but it is also properly nestable, like yaml, while being easily parsable and formattable. I wish it was used more as a config language.

DumbAceDragon

.gltf/.glb for models. It's way less of a headache than .obj and .dae, while also being way more feature rich than either.

Either that or .blend, because some things other than blender already support it and it'd make my life so much easier.

mercury reply

lemmy.blahaj.zone

USD is basically this, and is supported everywhere, give it a look!

DumbAceDragon reply

USD is more for scenes than models. It's meant primarily for stuff like 3dsmax and blender, and is far more complex than gltf.

It's also not really supported everywhere. Pretty much every game engine lacks support for USD, while most (except unity for some reason) have at least some gltf support.

USD is also, at least as far as I'm concerned, dead in the water. I have never encountered a single USD file in the wild, though that might just be because I mainly only work in blender and godot.

I'm not against USD, and I'd love to see it get some more love, but it serves a different purpose than gltf.

mercury reply

lemmy.blahaj.zone

Right, haha, I forgot about those! Sorry, I was only really thinking of different modeling software.

tal reply

game engine

So, you were refuting the "supported everywhere" bit. I'm not going to argue with that.

But I will point out that there is a difference between a format that is good for interchange and a format that is good for a game to internally use for rapid loading. I mean, games store things in a lot of ways that you wouldn't want to use for interchange. Games will have textures compressed in texture formats that can be sent straight to the video card, but are relatively space-inefficient; you'd want to use PNG or JPEG or something like that for interchange. You'll often use ZIP to bundle multiple files together; it's not necessarily because it's an ideal archive format, but because it provides indexed access to individual archive elements. Many games use WAV or other uncompressed audio formats that are cheap to load but not terribly space-efficient.

nothacking

UTF-8 for plain text, trying to figure out the encoding, especially with older files/equipment/software is super annoying.

drwankingstein

matroska for media, we already have MKA for audio and MKV for video. An image container would be good too.

mp4 is more prone to data loss and slower to parse, while also being less flexible, despite this it seems to be a sort of pseudo standard.

(MP4, M4A, HEIF formats like heic, avif)

jackpot reply

wait why not av4 or jpegxl

drwankingstein reply

those are media formats, not containers.

Turun reply

A mp4 file contains media in, for example, h264 and AAC codec, which is the combined for playback. It is not a codec itself.

jackpot reply

im compiling summarised list in body, what do i put this under and what file extensions

AlexWIWA

Markdown for all rich text that doesn't need super fancy shit like latex

darcy reply

which markdown implementation tho ?

Lemmy reply

iusearchlinux.fyi

GitLab/Hub obviously. Also it doesn't matter since I don't need to compile it to read it.

DonnerWolfBach reply

You compile your markdown and don't read it raw? /s

darcy reply

unironically a good point tho.

AlexWIWA reply

Oh I'm not brave enough for politics.

DigitalJacobin reply

morrowind reply

I'd argue asciidoc is better, but less well known

Lemmy reply

iusearchlinux.fyi

asciidoc lost me because it's not a markdown superset. Why invent yet another way of marking headlines?

Also GitLab/Hub markdown is the standard and I don't think we need another.

morrowind reply

That's a weird way of thinking. I could make the reverse argument.

Markdown lost me because it's not a subset to asciidoc, why invent yet another way of marking headlines?

Also asciidoc is the standard and I don't think we need another.

This whole thread is discussing ideal standards.

DigitalJacobin reply

danielfgom

Definitely FLAC for audio because it's lossless, if you record from a high fidelity source....

exFAT for external hard drives and SD cards because both Windows and Mac can read and write to it as well as Linux. And you don't have the permission pain....

glibg10b reply

What permission pain?

danielfgom reply

If you were to format the drive with extra and then copy something to it from Linux - if you try open it on another Linux machine (eg you distro hop after this event) it won't open the file because your aren't the owner.

Then you have to jump though hoops trying to make yourself the owner just so you can open your own file.

I learnt this the hard way so I just use exFAT and it all works.

glibg10b

JPEG XL for images because it compresses better than JPEG, PNG and WEBP most of the time.

XZ because it theoretically offers the highest compression ratio in most circumstances, and long decompression time isn't really an issue when the alternative is downloading a larger file over a slow connection.

Config files stored as serialized data structures instead of in plain text. This speeds up read times and removes the possibility of syntax or type errors. Also, fuck JSON.

I wish there were a good format for typesetting. Docx is closed and inflexible. LaTeX is unreadable, inefficient to type and hard to learn due to the inconsistencies that arise from its reliance on third-party packages and its lack of guidelines for their design.

nothendev reply

Typst for typesetting. Definitely underrated.

taladar

Some sort of machine-readable format for invoices and documents with related purposes (offers, bills of delivery, receipts,...) would be useful to get rid of some more of the informal paper or PDF human-readable documents we still use a lot. Ideally something JSON-based, perhaps with a cryptographic signature and encryption layer around it.

IsoKiero reply

This one exists. SEPA or ISO20022. Encryption/signing isn't included in the format, it's managed on transfer layer, but that's pretty much the standard every business around here works and many don't even accept PDFs or other human-readable documents anymore if you want to get your money.

taladar reply

Well, okay, let me rephrase that. It would be nice if the B2C communication used something like that too.

IsoKiero reply

In Finland it kinda-sorta does, for some companies (mostly for things where you pay monthly). You can get your invoices directly to your banking account and even accept them automatically if you wish. And that doesn't include anything else than invoices, so not exactly what you're after. And I agree, that would be very nice.

Some companies, like one of our major grocery chain, offer to store your receipts on their online service, but I think that you can only get a copy of the receipt there and it's not machine readable.

Feathercrown reply

Woah neat

jackpot reply

whats the file extension and whats the category name, compiling list in body

IsoKiero reply

It doesn't have any standardized extension. My solution uses .xml (as that's the format internally), but it's not anywhere in the standard. About category I don't really know. SEPA stands for Single Euro Payment Area, but it contains quite a lot of things, https://www.iso20022.org/about-iso-20022 has a bit more info on the standard itself, but there's no catchy category name either.

darcy

i hate to be that guy, but pick the right tool for the right job. use markdown for a readme and latex for a research paper. you dont need to create 'the ultimate file format' that can do both, but worse and less compatible

intrepid reply

lemmy.ca

I agree with your assertion that there isn't a perfect format. But the example you gave - markdown vs latex has a counter example - org mode. It can be used for both purposes and a load of others. Matroska container is similarly versatile. They are examples that carefully designed formats can reach a high level of versatility, though they may never become the perfect solution.

jackpot reply

org mode? whats rhe file extension

intrepid reply

.dontuse for snaps

jackpot

i'd like there to be a way to standardise midi info in plugins for music

Possibly linux

lemmy.zip

Is ogg lossless?

Supermariofan67 reply

It's a container format that can hold either lossless or lossy codecs

Bitrot reply

Yes, if you encode with a lossless codec like FLAC or OggPCM and not Vorbis or Opus.

jackpot reply

u/lukmly013 💾 (lemmy.sdf.org)

Something for I/Q recordings. But I don't know what would do it. Currently the most supported format seems to be s16be WAV, but there's different formats, bit depths and encodings. I've seen .iq, .sdriq, .sdr, .raw, .wav. Then there's different bit depths and encodings: u8, s8, s16be, s16le, f32,... Also there's different ways metadata like center frequency is stored.

jackpot reply

what is this

u/lukmly013 💾 (lemmy.sdf.org) reply

God damnit. I wrote an answer and it disappeared a while after pressing reply. I am lazy to rewrite it and my eyes are sore.

Anyway, I am too dumb to actually understand I/Q samples. It stands for In-Phase and Quadrature, they are 90° out of phase from each other. That's somehow used to reconstruct a signal. It's used in different areas. For me it's useful to record raw RF signals from software defined radio (SDR).
For example, with older, less secure systems, you could record signal from someone's car keyfob, then use a Tx-capable SDR to replay it later. Ta-da! Replay attack. You unlocked someone's car.
In a better way, you could record raw signal from a satellite to later demodulate and decode it, if your computer isn't powerful enough to do it in real-time.

If you want an example, you can download DAB+ radio signal recording here: https://www.sigidwiki.com/wiki/DAB%2B and then replay it in Welle.io (available as Appimage) if it's in compatible format. I haven't tested it.

PseudoSpock

.mom for ascii written Your Mom jokes.

christophski

feddit.uk

Some new format for DAW session files that is compatible with all DAWs. I believe ardour can import protools files but I bet a lot. Of work went into that.

d_k_bo reply

https://github.com/bitwig/dawproject

christophski reply

feddit.uk

Nice, hadn't seen this before. From the looks of the Ardour forum there is nobody currently looking at implementing the forum but they seem open to it. I would contribute but I only know python so probably not much use. I could write a ardour-dawproject translator in python but seems a bit pointless if someone goes and creates a proper implementation at some point anyway

OTDR measurement results in like XML or whatever open self documenting format, just not SOR. Or even just in actual standards compliant SOR, if that's all I can get.

jackpot reply

i dont understand any if the acrobyms

gkpy reply

except XML xD

Kazumara reply

OTDR: Optical Time Domain Reflectometry
SOR: Standard OTDR Record
XML: Extensible Markup Language

.sor files are a mess, poorly standardized, too restrictive as a format, and every manufacturer makes their own proprietary extensions.

jackpot reply

what file extension, what category

Kazumara reply

Category: OTDR measurement results
File extension: .xml or something entirely new

jackpot reply

what on earth does rhat do

Kazumara reply

An OTDR sends pulses of laser light into a fiber optic cable and records the minute reflections that occur at every point of the cable over time. The time of arrival of the reflections corresponds to the position of where it was reflected. This way you can record the attenuation of an entire cable just from shining in pulses from one end. Good for checking if a new cable was properly installed, or for finding the location of issues in existing cables for debugging.

musicmatze

.nix for software packaging.

jackpot reply

whats that and why nkt flatpak

Diana

explodicle

local106.com

blk.dat for all stored human labor, to eliminate the Cantillon effect.

-1

barrett9h

192 kHz for music.

The CD was the worst thing to happen in the history of audio. 44 (or 48) kHz is awful, and it is still prevalent. It would be better to wait a few more years and have better quality.

-4

Supermariofan67 reply

Why? What reason could there possibly be to store frequencies as high as 96 kHz? The limit of human hearing is 20 kHz, hence why 44.1 and 48 kHz sample rates are used

bellsDoSing reply

On top of that, 20 kHz is quite the theoretical upper limit.

Most people, be it due to aging (affects all of us) or due to behaviour (some way more than others), can't hear that far up anyway. Most people would be suprised how high up even e.g. 17 kHz is. Sounds a lot closer to very high pitched "hissing" or "shimmer", not something that's considered "tonal".

So yeah, saying "oh no, let me have my precious 30 kHz" really is questionable.

At least when it comes to listening to finished music files. The validity of higher sampling frequencies during various stages in the audio production process is a different, way less questionable topic,

christophski reply

feddit.uk

That is not what 96khz means. It doesn't just mean it can store frequencies up to that frequency, it means that there are 96,000 samples every second, so you capture more detail in the waveform.

Having said that I'll give anyone £1m if they can tell the difference between 48khz and 96khz. 96khz and 192khz should absolutely be used for capture but are absolutely not needed for playback.

Supermariofan67 reply

It means it can capture any frequency up to half the sample rate, perfectly. The "extra detail" in the waveform is higher frequencies beyond the range of human hearing

vrighter reply

this is a misconception about how waves are reconstructed. each sample is a single point in time. But the sampling theorem says that if you have a bunch of discrete samples, equally spaced in time, there is one and only one continuous solution that would hit those samples exactly, provided the original signal did not contain any frequencies above nyquist (half the sampling rate). Sampling any higher than that gives you no further useful information. There is stil only one solution.

tldr: the reconstructed signal is a continuous analog signal, not a stair step looking thing

barrett9h reply

because if you use a 40 kHz signal to "draw" a 10 kHz wave, the wave will have only four "pixels", so all the high frequencies have very low fidelity

Supermariofan67 reply

As long as the audio frequency is less than half the sample rate, it is a mathematical function with only one (exact) wave that is able to fit all 4 points, so it is perfectly reconstructed. This video provides a great visualization of it https://www.youtube.com/watch?v=cIQ9IXSUzuM

LaggyKar reply

I assume you're gonna back that up with a double blind ABX test?

Carlos Francisco 📑 reply

feddit.cl

44 KHz wasn't chosen randomly. It is based in the range of frequencies that humans can hear (20Hz to 20KHz) and the fact that a periodic waveform can be exactly rebuild as the original (in terms of frequency) when sampling rate is al least twice the bandwidth. So, if it is sampled at 44KHz you can get all components up to 22 KHz whics is more that we can hear.

vrighter reply

this is wrong. the first thing done before playing one of those files is running ithe audio through a low pass filter that removes any extra frequencies 192khz captures. because most speakers can't play them, and in fact would distort the rest of the sound (due to badly recreating them, resulting in aliasing).

192khz has a place, and it's called the recording studio. It's only useful when handling intermediate products in mixing and mastering. Once that is done, only the audible portion is needed. The inaudible stuff can either be removed beforehand, saving storage space, or distributed (as 192khz files) and your player will remove them for you before playback

Trimatrix

.exe to .sh low key turn all windows machines to Linux machines

-10

Hydroel reply

You're comparing compiled executables to scripts, it's apples and oranges.

pivot_root reply

I, for one, label my apple crates as oranges.

winebin="wine"
if file "$1" | grep 64-bit; then
    winebin="wine64"
fi

printf '%s %q $@ || exit $?' "$winebin" "$1" > "$1.sh"
chmod +x "$1.sh"

-1

OrangeXarot reply