In this paper, we study the trade-off between cleaning and annotating with a fixed budget, which works under a reduced data setting.

Our main contributions are

  1. Comprehensive study of learning with noisy NLP data in practical settings - Applied to modern pre-trained models (also baselines of older models) - Applied to diverse NLP task of varying difficulty (not just classification)
  2. Propose a new direction of cleaning rather than annotate
  3. Design a new model in this direction that reaches SOTA performance