In this paper, we study the trade-off between cleaning and annotating with a fixed budget, which works under a reduced data setting.
Our main contributions are
- Comprehensive study of learning with noisy NLP data in practical settings - Applied to modern pre-trained models (also baselines of older models) - Applied to diverse NLP task of varying difficulty (not just classification)
- Propose a new direction of cleaning rather than annotate
- Design a new model in this direction that reaches SOTA performance