Spyke

Comment on

The Curse of Recursion: Training on Generated Data Makes Models Forget

If the effect is strong enough, then it could have a very negative effect on LLM training in the near future, considering more and more of the internet contains ChatGPT & GPT-4 content in it and automatic detectors are currently quite poor.

View in thread

machinelearning

Comment on

Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models

It seems like for creative text generation tasks, metrics have been shown to be deficient; this even holds for the new model-based metrics. That leaves human evaluation (both intrinsic and extrinsic) as the gold standard for those types of tasks. I wonder if the results from this paper (and other future papers that look automatic CV metrics) will lead reviewers to demand more human evaluation in CV tasks like they do for certain NLP tasks.

View in thread

machinelearning

Comment on

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

Reply in thread

If there isn't any discussion on reddit (no discussion in this case), I don't see a reason to link to reddit; you can just link to the project page. That said, if you think there is important discussion happening that is helpful for understanding the paper, then use a teddit link instead, like:

https://teddit.net/r/MachineLearning/comments/14pq5mq/r_hardwiring_vit_patch_selectivity_into_cnns/

View in thread

machinelearning

Comment on

[@machinelearning](https://kbin.social/m/machinelearning) am I in the right place? Lol

@Koffindodjer indeed you are!

View in thread

machinelearning

Comment on

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Also reminds me of this ICLR paper: Linearly Mapping from Image to Text Space.

View in thread

machinelearning

Comment on

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

Reply in thread

That's appreciated!

View in thread

machinelearning

Comment on

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Research into efficient optimization techniques seems pretty important given the scale of LLMs these days. Nice to see a second-order approach that achieves reasonable wall-clock improvements.

View in thread

machinelearning

Comment on

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Averaging model weights seems to help across textual domains as well, see Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models and Scaling Expert Language Models with Unsupervised Domain Discovery. I wonder if the two types of averaging (across hyperparameters and across domains) can be combined to produce even better models.

View in thread

machinelearning

Comment on

Extending Context Window of Large Language Models via Positional Interpolation

Reply in thread

do you have a link?

View in thread