Release Audio Editing Triplets Dataset & SAO-Instruct On Hugging Face
Hey everyone! 👋 Niels from the open-source team at Hugging Face here. I stumbled upon some seriously cool work by @ungersboeck, and I'm stoked to share it with you all. Their paper got featured on Hugging Face's daily papers (check it out: https://huggingface.co/papers/2510.22795), and I'm here to chat about how we can get their awesome resources – the Audio Editing Triplets Dataset and the SAO-Instruct model – shining bright on the Hugging Face Hub!
Making Your Work Discoverable: Hugging Face's Role
First off, let's talk about why getting your stuff on the Hugging Face Hub is a win-win. The paper page is like a central hub where people can dive deep into your research. They can discuss your paper, find all the goodies associated with it (like your models, datasets, and even demos!), and connect with your work. And guess what? You can even claim your paper on the platform, which will make it pop up on your public profile, and you can link your GitHub and project page URLs for everyone to see. Pretty neat, right? The main goal is to increase the visibility and findability of your work, so more people can benefit from it.
Now, let's get into the specifics of how we can make these resources super accessible. We're talking about making it easy for people to find and use your model and dataset. This means adding tags so that users can filter them efficiently on the platform. The Hugging Face Hub allows users to search for models and datasets efficiently, and tags allow users to find related resources in a more efficient manner.
The SAO-Instruct Model: Already on the Hub!
I was super excited to see that the SAO-Instruct model is already available on the Hugging Face Hub! You can find it right here: https://huggingface.co/disco-eth/sao-instruct. Way to go, guys!
To make this even better, we can officially link it to your paper page, which boosts its discoverability. Linking the model to your research paper helps to create a streamlined experience for users, allowing them to easily access all relevant information in one place. This makes it easier for people to understand your work and implement your solutions.
For future reference, if you want to upload models, here's a handy guide: https://huggingface.co/docs/hub/models-uploading.
When uploading models, we suggest that researchers push each model checkpoint to a separate model repository. This allows download stats to work properly. Then, you can link the checkpoints to your paper page.
Here are some tips for uploading models to Hugging Face:
- Use the PyTorchModelHubMixin: If your model is built with PyTorch, you can use the
PyTorchModelHubMixinclass. This class addsfrom_pretrainedandpush_to_hubto any customnn.Module. This class is convenient because it streamlines the process, making it easier to upload models. - Leverage the hf_hub_download one-liner: You can use the
hf_hub_downloadone-liner to download a checkpoint from the Hub. This allows users to download models and datasets directly to their local environment.
Audio Editing Triplets Dataset: Let's Get It Hosted!
Now, let's chat about the Audio Editing Triplets Dataset. I noticed that your paper introduces this dataset, and you've provided scripts to generate it. Awesome work! We'd love to host this dataset on the Hugging Face Hub as a pre-packaged download. That makes it super simple for folks to use.
Imagine this: people can load your dataset with just a few lines of code, like this:
from datasets import load_dataset
dataset = load_dataset("your-hf-org-or-username/your-dataset")
This makes it easy for others to use your dataset, which speeds up the research process, and allows users to load the dataset in their own environment. It also allows your dataset to get more exposure.
For a guide on how to upload datasets, check this out: https://huggingface.co/docs/datasets/loading.
Dataset Viewer: Explore with Ease!
Another cool feature is the dataset viewer. It lets people quickly explore the first few rows of your data right in their browser. You can find it here: https://huggingface.co/docs/hub/en/datasets-viewer.
This allows users to quickly explore the data in their browser. The dataset viewer is a great way to showcase your dataset and make it more accessible to others. It allows users to quickly understand the structure and content of your dataset.
Let's Make It Happen!
So, what do you think? Are you interested in getting your dataset hosted? Do you need any help with this? We're here to support you every step of the way.
I'm excited to see your work flourish on the Hub and make a significant impact in the audio editing field. Feel free to reach out; I'm happy to help!
Cheers,
Niels ML Engineer @ HF 🤗