Data collator used for BART denoising language modeling. The code is largely copied from
`<https://github.com/morganmcg1/rotobart/blob/main/data_collator.py#L223>`__.
For more information on how BART denoising language modeling works, one can take a look
at the `official paper <https://arxiv.org/pdf/1910.13461.pdf>`__
or the `official code for preprocessing <https://github.com/facebookresearch/fairseq/blob/main/fairseq/data/denoising_dataset.py>`__ .
- 官方code:https://github.com/facebookresearch/fairseq/blob/main/fairseq/data/denoising_dataset.py
- transformer示例代码:https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_bart_dlm_flax.py
- 其他示例代码:https://github.com/morganmcg1/rotobart/blob/main/data_collator.py#L223
- bert预训练任务数据处理代码示例:https://blog.csdn.net/Finks_Chen/article/details/119334214
- 其他数据处理工具包:https://github.com/prajdabre/yanmtt