Applied and Computational Engineering

- The Open Access Proceedings Series for Conferences


Proceedings of the 4th International Conference on Signal Processing and Machine Learning

Series Vol. 52 , 27 March 2024


Open Access | Article

Exploring the potential of data augmentation in poetry generation with small-scale corpora

Renxiang Huang * 1
1 University of Illinois Urbana-Champaign

* Author to whom correspondence should be addressed.

Applied and Computational Engineering, Vol. 52, 31-38
Published 27 March 2024. © 27 March 2024 The Author(s). Published by EWA Publishing
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Citation Renxiang Huang. Exploring the potential of data augmentation in poetry generation with small-scale corpora. ACE (2024) Vol. 52: 31-38. DOI: 10.54254/2755-2721/52/20241204.

Abstract

Poetry generation is a complex task in the field of natural language processing, especially when working with small datasets. Data augmentation techniques have been shown to be an effective way to improve the performance of deep learning models in various tasks, including image classification and speech recognition. Therefore, this study focuses on exploring the impact of four different data augmentation methods - Synonym Replacement, Random Insertion, Random Swap, and Random Deletion - on the performance of poetry generation with a small poetry dataset. The results of the study reveal that Random Insertion performed well in terms of Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE), and manual evaluation when compared to other data augmentation techniques. Synonym Replacement performed poorly in all three evaluations. This study confirms the potential value of data augmentation technology in poetry generation tasks and provides innovative perspectives and directions for future research in this area. Data augmentation can be employed to help address the problem of limited data in poetry generation tasks and enhance the efficiency of deep learning models. Future research could focus on exploring more advanced data augmentation techniques and their impact on poetry generation tasks.

Keywords

Data Augmentation, Poetry Generation, Natural language Processing, Deep Learning.

References

1. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.

2. Li, J., Tang, T., Zhao, W. X., Nie, J. Y., & Wen, J. R. (2022). Pretrained language models for text generation: A survey. arXiv preprint arXiv:2201.05273.

3. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 1-9.

4. Wei, J., & Zou, K. (2019). Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196.

5. Merve Noyan. Poetry. URL: https://huggingface.co/datasets/merve/poetry. Last Accessed: 2023/09/17.

6. WordNet Princeton University. URL: https://wordnet.princeton.edu/. Last accessed 2023/09/23.

7. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.

8. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1-35.

9. Yan, R. (2016). i, Poet: Automatic Poetry Composition through Recurrent Neural Networks with Iterative Polishing Schema. In IJCAI, 2238, 2244.

10. Yan, R., Jiang, H., Lapata, M., Lin, S. D., Lv, X., & Li, X. (2013). I, poet: automatic chinese poetry composition through a generative summarization framework under constrained optimization. In Twenty-Third International Joint Conference on Artificial Intelligence, 2197-2203.

11. Van de Cruys, T. (2020). Automatic poetry generation from prosaic text. In Proceedings of the 58th annual meeting of the association for computational linguistics, 2471-2480.

12. Kobayashi, S. (2018). Contextual augmentation: Data augmentation by words with paradigmatic relations. arXiv preprint arXiv:1805.06201.

13. Zhang, Y., Gan, Z., & Carin, L. (2016). Generating text via adversarial training. In NeurIPS workshop on Adversarial Training, 21, 21-32.

14. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al (2017). Attention is all you need. Advances in neural information processing systems, 30, 1-9.

15. Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training, 1-12.

16. Oliveira, H. G., Hervás, R., Díaz, A., & Gervás, P. (2017). Multilingual extension and evaluation of a poetry generator. Natural Language Engineering, 23(6), 929-967.

Data Availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Authors who publish this series agree to the following terms:

1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.

2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open Access Instruction).

Volume Title
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
ISBN (Print)
978-1-83558-349-4
ISBN (Online)
978-1-83558-350-0
Published Date
27 March 2024
Series
Applied and Computational Engineering
ISSN (Print)
2755-2721
ISSN (Online)
2755-273X
DOI
10.54254/2755-2721/52/20241204
Copyright
27 March 2024
Open Access
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

Copyright © 2023 EWA Publishing. Unless Otherwise Stated