References#

1

M. Levy. Improving perceptual tempo estimation with crowd-sourced annotations. In 12th International Society for Music Information Retrieval Conference (ISMIR). 2011.

2

M. Cartwright and B. A. Pardo. Vocalsketch: vocally imitating audio concepts. In CHI - Proceedings of the 33rd Annual CHI Conference on Human Factors in Computing Systems. 2015.

3

B. McFee, E.J. Humphrey, and J.P. Bello. A software framework for musical data augmentation. In 16th International Society for Music Information Retrieval Conference (ISMIR). 2015.

4

S. Uhlich, M. Porcu, F. Giron, M. Enenkl, T. Kemp, N. Takahashi, and Y. Mitsufuji. Improving music source separation based on deep neural networks through data augmentation and network blending. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017.

5

M. Cartwright and J. P. Bello. Increasing Drum Transcription Vocabulary Using Data Synthesis. In Proc. of the 21st International Conference on Digital Audio Effects (DAFx). 2018.

6

Ethan Manilow, Gordon Wichern, Prem Seetharaman, and Jonathan Le Roux. Cutting music source separation some Slakh: a dataset to study the impact of training data quality and quantity. In Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). 2019.

7

J. Pons, O. Nieto, M. Prockup, E. M. Schmidt, A. F. Ehmann, and X. Serra. End-to-end Learning for Music Audio Tagging at Scale. In Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR. 2018.

8

T. Li and M. Ogihara. Music artist style identification by semi-supervised learning from both lyrics and content. In proceedings of the 12th ACM International Conference on Multimedia. 2004.

9

W. You and R. B. Dannenberg. Polyphonic music note onset detection using semi-supervised learning. In Proceedings of the 8th International Society for Music Information Retrieval Conference (ISMIR). 2007.

10

Yu Wang, Justin Salamon, Mark Cartwright, Nicholas J. Bryan, and Juan Pablo Bello. Few-shot drum transcription in polyphonic music. In International Society for Music Information Retrieval (ISMIR) Conference. 2020.

11

Yisheng Song, Ting-Yuan Wang, Subrota Kumar Mondal, and Jyoti Prakash Sahoo. A comprehensive survey of few-shot learning: evolution, applications, challenges, and opportunities. ArXiv, 2022.

12

Hugo Flores Garcia, Aldo Aguilar, Ethan Manilow, and Bryan Pardo. Leveraging hierarchical structures for few-shot musical instrument recognition. In Proceedings of the 22nd International Society of Music Information Retrieval Conference (ISMIR 2021). 2021. URL: https://interactiveaudiolab.github.io/assets/papers/flores2021leveraging.pdf.

13

Yu Wang, Daniel Stoller, Rachel M. Bittner, and Juan Pablo Bello. Few-shot musical source separation. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume, 121–125. 2022. doi:10.1109/ICASSP43922.2022.9747536.

14

Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. Matching networks for one shot learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS'16, 3637–3645. Red Hook, NY, USA, 2016. Curran Associates Inc.

15

Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning. In International Conference on Learning Representations. 2017. URL: https://openreview.net/forum?id=rJY0-Kcll.

16

Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.

17

Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip H.S. Torr, and Timothy M. Hospedales. Learning to compare: relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2018.

18

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, 1126–1135. PMLR, 06–11 Aug 2017. URL: https://proceedings.mlr.press/v70/finn17a.html.

19

Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. Meta-sgd: learning to learn quickly for few shot learning. CoRR, 2017. URL: http://arxiv.org/abs/1707.09835, arXiv:1707.09835.

20

Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. Meta-transfer learning for few-shot learning. In CVPR, 403–412. 2019.

21

Christoph H. Lampert, Hannes Nickisch, and Stefan Harmeling. Learning to detect unseen object classes by between-class attribute transfer. In Computer Vision and Pattern Recognition (CVPR). 2009.

22

Abhijit Bendale and Terrance E. Boult. Towards open set deep networks. In Computer Vision and Pattern Recognition (CVPR). 2016.

23

Andrea Frome, Greg S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. Devise: a deep visual-semantic embedding model. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. 2013.

24

Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Gregory S. Corrado, and Jeffrey Dean. Zero-shot learning by convex combination of semantic embeddings. In International Conference on Learning Representations (ICLR). 2013.

25

Elyor Kodirov, Tao Xiang, and Shaogang Gong. Semantic autoencoder for zero-shot learning. In CoRR. 2017.

26

Yongqin Xian, Tobias Lorenz, Bernt Schiele, and Zeynep Akata. Feature generating networks for zero-shot learning. In CoRR. 2017.

27

Christoph H. Lampert, Hannes Nickisch, and Stefan Harmeling. Attribute-based classification for zero-shot visual object categorization. In IEEE Transactions on Pattern Analysis and Machine Intelligence. 2014.

28

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The caltech-ucsd birds-200-2011 dataset. In Technical Report. California Institute of Technology, 2011.

29

Jeong Choi, Jongpil Lee, Jiyoung Park, and Juhan Nam. Zero-shot learning for audio-based music classification and tagging. In International Society for Music Information Retrieval (ISMIR) Conference. 2019.

30

George A. Miller. Wordnet: a lexical database for english. In Commun. ACM. 1995.

31

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. 2013.

32

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT. 2019.

33

missing booktitle in doh2020musicalwe

34

Sergio Oramas, Oriol Nieto, Francesco Barbieri, and Xavier Serra. Multi-label music genre classification from audio, text and images using deep features. In International Society for Music Information Retrieval (ISMIR) Conference. 2017.

35

Bangpeng Yao and Li Fei-Fei. Grouplet: a structured image representation for recognizing human and object interactions. In Computer Vision and Pattern Recognition (CVPR). 2010.

36

Szu-Yu Chou, Kai-Hsiang Cheng, Jyh-Shing Roger Jang, and Yi-Hsuan Yang. Learning to match transient sound events using attentional similarity for few-shot sound recognition. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 26–30. 2019.

37

Shilei Zhang, Yong Qin, Kewei Sun, and Yonghua Lin. Few-shot audio classification with attentional graph neural networks. In INTERSPEECH. 2019.

38

Erich M. von Hornbostel and Curt Sachs. Classification of musical instruments: translated from the original german by anthony baines and klaus p. wachsmann. The Galpin Society Journal, 14:3–29, 1961.

39

Y. Wang, J. Salamon, N. J. Bryan, and J. Pablo Bello. Few-shot sound event detection. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume, 81–85. 2020. doi:10.1109/ICASSP40776.2020.9054708.

40

missing booktitle in choi2019zsltransfer

41

R. Vogl, G. Widmer, and P. Knees. Towards multi-instrument drum transcription. In Proc. of the 21st International Conference on Digital Audio Effects (DAFx). 2018.

42

Andreas Jansson, Eric Humphrey, Nicola Montecchio, Rachel Bittner, Aparna Kumar, and Tillman Weyde. Singing voice separation with deep u-net convolutional networks. In Proc. of the 18th International Society for Music Information Retrieval Conference (ISMIR). 2017.

43

Olga Slizovskaia, Leo Kim, Gloria Haro, and Emilia Gomez. End-to-end sound source separation conditioned on instrument labels. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2019.

44

Gabriel Meseguer-Brocal and Geoffroy Peeters. Conditioned-u-net: introducing a control mechanism in the u-net for multiple source separations. In Proc. of the 20th International Society for Music Information Retrieval Conference (ISMIR). 2019.

45

Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, and Aaron C. Courville. Film: visual reasoning with a general conditioning layer. In AAAI. 2018.

46

Siavash Khodadadeh, Ladislau Boloni, and Mubarak Shah. Unsupervised meta-learning for few-shot image classification. In Advances in Neural Information Processing Systems, 10132–10142. 2019.

47

Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, and Richard S. Zemel. Meta-learning for semi-supervised few-shot classification. In Proceedings of 6th International Conference on Learning Representations ICLR. 2018.

48

Yu Wang, Nicholas J. Bryan, Mark Cartwright, Juan Pablo Bello, and Justin Salamon. Few-shot continual learning for audio classification. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 321–325. 2021.

49

Yu Wang, Nicholas J. Bryan, Justin Salamon, Mark Cartwright, and Juan Pablo Bello. Who calls the shots rethinking few-shot learning for audio. In 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021. 2021.

50

Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. Meta-learning with memory-augmented neural networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML'16, 1842–1850. JMLR.org, 2016.