Spyke

Posts

SeamlessM4T: Multimodal Model for Speech Translation

Meta releases SeamlessM4T, a general multilingual speech/text model claimed to surpass OpenAI's Whisper. It's available on github and everything can be used for free in a non-commercial setting.

Model Features:

  • Automatic speech recognition for ~100 languages.
  • Speech-to-text translation for ~100 input/output languages.
  • Speech-to-speech translation for ~100 input languages and 35 output languages.
  • Text-to-text and text-to-speech translation for nearly 100 languages.

Dataset:

  • SeamlessAlign: Open multimodal translation dataset with 270,000 hours of speech and text alignments.

Technical Insights:

  • Utilizes a multilingual and multimodal text embedding space for 200 languages.
  • Applied a teacher-student approach to extend this embedding space to the speech modality, covering 36 languages.
  • Mining performed on publicly available repositories resulted in 443,000 hours of speech aligned with texts and 29,000 hours of speech-to-speech alignments.

Toxicity Filter:

  • The model identifies toxic words from speech inputs/outputs and filters unbalanced toxicity in training data.
  • The demo detects toxicity in both input and output. If toxicity is only detected in the output, a warning is included and the output is not shown.
  • Given how impaired llama2-chat has been due to these kind of filters, it's unclear how useful these models are in a general setting.
View original on sh.itjust.works

Hugging Face Releases IDEFICS: An Open-Access 80B Visual Language Model Replicating DeepMind's Flamingo

Hugging Face released IDEFICS, an 80B open-access visual language model replicating DeepMind's unreleased Flamingo. Built entirely on public data, it's the first of its size available openly. Part of its training utilized OBELICS, a dataset with 141M web pages, 353M images, and 115B text tokens from Common Crawl.

View original on sh.itjust.works

Hugging Face Releases IDEFICS: An Open-Access 80B Visual Language Model Replicating DeepMind's Flamingo

Hugging Face released IDEFICS, an 80B open-access visual language model replicating DeepMind's unreleased Flamingo. Built entirely on public data, it's the first of its size available openly. Part of its training utilized OBELICS, a dataset with 141M web pages, 353M images, and 115B text tokens from Common Crawl.

View original on sh.itjust.works

Announcing StableCode — Stability AI

Stability AI released three new 3b models for coding:

  • stablecode-instruct-alpha-3b (context length 4k)
  • stablecode-completion-alpha-3b-4k (context length 4k)
  • stablecode-completion-alpha-3b (context length 16k)

I didn't try any of them yet, since I'm waiting for the GGML files to be supported by llama.cpp, but I think especially the 16k model seems interesting. If anyone wants to share their experience with it, I'd be happy to hear it!

Announcing StableCode — Stability AIhttps://stability.ai/blog/stablecode-llm-generative-ai-codingOpen linkView original on sh.itjust.works

What are your favorite models so far?

I think it's a good idea to share experiences about LLMs here, since benchmarks can only give a very rough overview on how well a model performs.

So please share how much you're using LLMs, what you use them for and how they well they perform at those tasks. For example, here are my answers to these questions:

Usage

I use LLMs daily for work and for random questions that I would previously use web search for.

I mainly use LLMs for reasoning heavy tasks, such as assisting with math or programming. Other frequent tasks include proofreading, helping with bureaucracy, or assisting with writing when it matters.

Models

The one I find most impressive at the moment is TheBloke/airoboros-l2-70B-gpt4-1.4.1-GGML/airoboros-l2-70b-gpt4-1.4.1.ggmlv3.q2_K.bin. It often manages to reason correctly on questions where most other models I tried fail, even though most humans wouldn't. I was surprised that something using only 2.5 bits per weight on average could produce anything but garbage. Downsides are that loading times are rather long, so I wouldn't ask it a question if I didn't want to wait. (Time to first token is almost 50s!). I'd love to hear how bigger quantizations or the unquantized versions perform.

Another one that made a good impression on me is Qwen-7B-Chat (demo). It manages to correctly answer some questions where even some llama2-70b finetunes fail, but so far I'm getting memory leaks when running it on my M1 mac in fp16 mode, so I didn't use it a lot. (this has been fixed it seems!)

All other models I briefly tried where not too useful. It's nice to be able to run them locally, but they were so much worse than chatGPT that it's often not even worth it to consider using them.

View original on sh.itjust.works

You reached the end