任务描述

音位分类预测（Phoneme classification），我们有音频->音位这样的训练数据，想要训练一个模型，学习这样的对应关系，然后给定音频，预测其音位

音位

音位（phoneme），是人类某一种语言中能够区别意义的最小语音单位，是音位学分析的基础概念。每种语言都有一套自己的音位系统。

音频处理

通过一定方法，将连续的音频信号处理切分成若干个frame，每一个frame相当于一个音位

数据

整体数据结构如下

feat
- test
  
  1078个测试样本，每个样本以id.pt文件的形式存储该样本的特征，其中id唯一标识一个样本（音频）。对于每一个样本可通过torch.load读取出Tensor类型的数据，数据维度：（n_frames, frature_dim）
  - n_frames
    
    一条音频样本经过处理形成的多个frame，见上文音频处理，不同音频样本经过处理后生成的fram个数是不同的
  - feature_dim
    
    对于每一个frame经过处理形成的特征的维度，所有frame的特征维度均为39
- train
  
  以id.pt文件的形式存储所有训练样本的特征，可进一步划分为训练集和验证集，特征存储方式同test
- test_split.txt
  
  所有测试样本的id的集合
- train_labels.txt
  
  所有训练样本的标签的集合，每一行第一列表示
- train_split.txt
  
  所有训练样本的id的集合

代码细节

torch.permute()

用于对Tensor的维度进行变换，由0，1，2...指定原始维度改变之后的维度

x = torch.randn((2, 3, 4))
print(x.size())
x = x.permute(2, 0, 1)
print(x.size())

# output
# torch.Size([2, 3, 4])
# torch.Size([4, 2, 3])

python除法

python有两个除法运算符

/为传统除法，根据运算数的类型进行传统除法运算
//为floor除法，运算结果直接舍弃小数部分

炼丹

sample baseline

直接运行样例代码即可

self.block = nn.Sequential(
    nn.Linear(input_dim, output_dim),
    nn.ReLU(),
)

concat_nframes = 1              
train_ratio = 0.8               

# training parameters
batch_size = 512                # batch size
num_epoch = 5                   # the number of training epoch
learning_rate = 0.0001          # learning rate

# model parameters
hidden_layers = 1               # the number of hidden layers
hidden_dim = 256                # the hidden dim

Medium Baseline

增加concat_nframes，修改模型宽度和深度，增加训练轮数，另外增加batch_size能够提高模型表现

concat_nframes = 17
train_ratio = 0.9
batch_size = 2048
num_epoch = 20
learning_rate = 0.001
hidden_layers = 5
hidden_dim = 1700
dropout=0.35
BN

Score: 0.756

Private score: 0.75667

要想达到boss baseline需要使用RNN，后续过来补充

标签：dim,classification,音频,样本,音位,hidden,HW2,size
From： https://www.cnblogs.com/dctwan/p/17348342.html

【提示学习】Exploiting Cloze Questions for Few Shot Text Classification and Natu
论文信息名称内容论文标题ExploitingClozeQuestionsforFewShotTextClassificationandNaturalLanguageInference论文地址https://arxiv.org/abs/2001.07676研究领域NLP,文本分类,提示学习,PET提出模型PET(Pattern-ExploitingTraining)来源EACL2021阅读摘要目前......
论文解读（ FGSM）《Adversarial training methods for semi-supervised text classificat
论文信息论文标题：Adversarialtrainingmethodsforsemi-supervisedtextclassification论文作者：TaekyungKim论文来源：ICLR2017论文地址：download 论文代码：download视屏讲解：click1 背景1.1 对抗性实例（Adversarialexamples）通过对输入进行小扰动创建的实例，可显著增加机器......
迁移学习《Cluster-Guided Semi-Supervised Domain Adaptation for Imbalanced Medica
论文信息论文标题：Cluster-GuidedSemi-SupervisedDomainAdaptationforImbalancedMedicalImageClassification论文作者：S.Harada,RyomaBise,KengoAraki论文来源：ArXiv2March2023论文地址：download 论文代码：download视屏讲解：click1摘要一种半监督域自适应方法，......
Graph Classification mini-batch 训练方法
参考资料colab教程PyTorchGeometricoptsforanotherapproachtoachieveparallelizationacrossanumberofexamples.Here,adjacencymatricesarestacked......
论文解读《KNN-Contrastive Learning for Out-of-Domain Intent Classification》
论文信息论文标题：KNN-ContrastiveLearningforOut-of-DomainIntentClassification论文作者：YunhuaZhou,PeijuLiu,XipengQiu论文来源：ArXiv2021论文地址：download......
论文阅读—第一篇《ImageNet Classification with Deep Convolutional Neural Network
ImageNetClassificationwithDeepConvolutionalNeuralNetworks论文地址1.研究背景：在计算机视觉领域，识别大规模图像集合是一个重要的任务。然而，由于数据量大，多样性......
《ImageNet Classification with Deep Convolutional Neural Networks （AlexNet）2012》
Abstract作者训练了一个大型的深度卷积神经网络，用于在ImageNetLSVRC-2010比赛中对120万张高分辨率图像分为1000个不同的类别。在测试数据上，作者们达到了top-1和top-......
Rethinking CNN Models for Audio Classification
WhatenablestheImageNetpretrainedmodelstolearnusefulaudiorepresentations,wesystematicallystudyhowmuchofpretrainedweightsisusefulforlearnin......
classification_report()评估报告
1、使用数据生成器后获得标签映射方法一：labels=[kforkintrain_generator.class_indices]方法二：1labels=[None]*len(test_generator.class_indices)2fo......
物料Classification 分类系统
作用：可以追加物料的属性，因为在物料主界面字段是有限的，并且并不是符合所有企业的业务，可以使用追加属性的方式给物料添加各式各样的属性1.创建特性，Tcode：CT04 2.创建分类......

HW2：classification