首页 > 其他分享 >ml.net例子笔记4-ml.net v2版本例子运行

ml.net例子笔记4-ml.net v2版本例子运行

时间:2023-12-20 09:57:55浏览次数:30  
标签:TorchSharp ml libtorch 例子 https ML net com Microsoft

1 Ml.NET版本更新

当前的Microsoft.ML的软件版本如下:

https://gitee.com/mirrors_feiyun0112/machinelearning-samples.zh-cn 例子使用版本为1.6.0
例子工程更换版本的办法:
1 Directory.Build.props nuget.config
修改samples目录下文件Directory.Build.props的内容


~~ ~~
** 2.0.1**
0.18.0


2 打开samples\csharp\All-Samples.sln解决方案
VisualStudio就会加载新的版本的Microsoft.ML库

如以前的工程的引用ml.net库的地方类似如下:

2 例子更新版本到ml.net2.0.1

3 情绪分析例子 [SentimentAnalysis]

SentimentAnalysisConsoleApp.csproj工程的设置修改为:

Exe ** netcoreapp2.1** **变更为如下:** Exe ** .net6.0** latest 最终编译结果的差别如下: 使用ml.net v2后指定.net6的编译文件 ![](https://cdn.nlark.com/yuque/0/2023/png/2964849/1703036631994-5b2ce9b6-8404-4dc8-ba12-c400939c8b4d.png#averageHue=%23f9f8f7&id=txQRo&originHeight=982&originWidth=1165&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 运行程序 ![](https://cdn.nlark.com/yuque/0/2023/png/2964849/1703036632316-e337db90-d185-49ac-ab2a-b65535e7e1de.png#averageHue=%23fcf5e2&id=KlWIL&originHeight=543&originWidth=1404&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) ### 4 **垃圾信息检测** SpamDetectionConsoleApp设置后运行 ![](https://cdn.nlark.com/yuque/0/2023/png/2964849/1703036632637-f37893ac-1301-4050-a855-dbb4cc56ab11.png#averageHue=%23faf2de&id=rbSFU&originHeight=980&originWidth=1340&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) ## 5 ML.NET2官方的例子 [https://github.com/dotnet/machinelearning-samples/tree/main/samples/csharp/getting-started/MLNET2](https://github.com/dotnet/machinelearning-samples/tree/main/samples/csharp/getting-started/MLNET2) [https://gitee.com/mirrors_dotnet/machinelearning-samples](https://gitee.com/mirrors_dotnet/machinelearning-samples) 这是gitee中国镜像站,1.8G,很大的文件 目前这个是英文的

6 AutoML

  • AutoMLQuickStart - C# console application that shows how to get started with the AutoML API.
  • AutoMLAdvanced - C# console application that shows the following concepts:
    • Modifying column inference results
    • Excluding trainers
    • Configuring monitoring
    • Choosing tuners
    • Cancelling experiments
  • AutoMLEstimators - C# console application that shows how to:
    • Customize search spaces
    • Create sweepable estimators
  • AutoMLTrialRunner - C# console application that shows how to create your own trial runner for the Text Classification API.

7 Natural Language Processing (NLP)

8 例子解析

数据来源,从这个地址下载

9 句子相似度 SentenceSimilarity

【测试时机器没有cuda环境,使用cpu进行训练】
train.csv - 训练集,包含产品、搜索和相关性分数

id product_uid product_title search_term relevance
2 100001 Simpson Strong-Tie 12-Gauge Angle angle bracket 3
3 100001 Simpson Strong-Tie 12-Gauge Angle l bracket 2.5
9 100002 BEHR Premium Textured DeckOver 1-gal. #SC-141 Tugboat Wood and Concrete Coating deck over 3

home-depot-sentence-similarity.csv数据代码库没有,原始的train.csv 和 home-depot-sentence-similarity.csv关系,可以参考如下下载和生成
https://github.com/dotnet/machinelearning-samples/issues/982 【我按照代码定义的格式写了合并 csv的数据预处理,如下
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;
namespace SentenceSimilarity
{
internal class GenData
{
// id product_uid product_title search_term relevance
// 2 100001 Simpson Strong-Tie 12-Gauge Angle angle bracket 3
public class HomeDepot
{
[LoadColumn(0)]
public int id { get; set; }
[LoadColumn(1)]
public int product_uid { get; set; }
[LoadColumn(2)]
public string product_title { get; set; }
[LoadColumn(3)]
public string search_term { get; set; }
[LoadColumn(4)]
public string relevance { get; set; }
}
// https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.custommappingcatalog.custommapping?view=ml-dotnet
[CustomMappingFactoryAttribute("product_description")]
private class ProdDescCustomAction : CustomMappingFactory<HomeDepot, CustomMappingOutput>
{
// We define the custom mapping between input and output rows that will
// be applied by the transformation.
public static void CustomAction(HomeDepot input, CustomMappingOutput
output) => output.product_description = prodDesc[input.product_uid.ToString()];
public override Action<HomeDepot, CustomMappingOutput> GetMapping()
=> CustomAction;
}
// Defines only the column to be generated by the custom mapping
// transformation in addition to the columns already present.
private class CustomMappingOutput
{
public string product_description { get; set; }
}
static Dictionary<string, string> prodDesc = new Dictionary<string, string>();
static void Main(string[] args)
{
var mlContext = new MLContext(seed: 1);
var DataPath = Path.GetFullPath(@"........\Data\product_descriptions.csv");
{
IDataView dv = mlContext.Data.LoadFromTextFile(DataPath, hasHeader: true, separatorChar: ',', allowQuoting: true,
columns: new[] {
new TextLoader.Column("product_uid",DataKind.String,0),
new TextLoader.Column("product_description",DataKind.String,1)
}
);
foreach (var row in dv.Preview(maxRows: 15_0000).RowView)
{
string uid="", desc="";
foreach (KeyValuePair<string, object> column in row.Values)
{
if (column.Key == "product_uid")
{
uid = column.Value.ToString();
}
else
{
desc= column.Value.ToString();
}
}
prodDesc[uid] = desc;
}
}
DataPath = Path.GetFullPath(@"........\Data\train.csv");
IDataView dataView = mlContext.Data.LoadFromTextFile(DataPath, hasHeader: true, separatorChar: ',', allowQuoting: true);
var preViewTransformedData = dataView.Preview(maxRows: 5);
foreach (var row in preViewTransformedData.RowView)
{
var ColumnCollection = row.Values;
string lineToPrint = "Row--> ";
foreach (KeyValuePair<string, object> column in ColumnCollection)
{
lineToPrint += $"| {column.Key}:{column.Value}";
}
Console.WriteLine(lineToPrint + "\n");
}
var pipeline = mlContext.Transforms.CustomMapping(new ProdDescCustomAction().GetMapping(), contractName: "product_description");
var transformedData = pipeline.Fit(dataView).Transform(dataView);
//mlContext.ComponentCatalog.RegisterAssembly(typeof(IsUnderThirtyCustomAction).Assembly);
Console.WriteLine("save file");
using FileStream fs = new FileStream(Path.GetFullPath(@"........\Data\home-depot-sentence-similarity.csv"), FileMode.Create);
mlContext.Data.SaveAsText(transformedData, fs, schema: false, separatorChar:',');
}
}
}
具体参考 https://gitee.com/iamops/x-unix-dotnet/blob/main/ml.net2/SentenceSimilarity/GenData.cs

数据放好后运行时,会类似如下下载模型文件:
[Source=NasBertTrainer; TrainModel, Kind=Trace] Channel started
[Source=NasBertTrainer; Ensuring model file is present., Kind=Trace] Channel started
[Source=NasBertTrainer; Ensuring model file is present., Kind=Info] Downloading NasBert2000000.tsm from https://aka.ms/mlnet-resources/models/NasBert2000000.tsm to C:\Users\homelap\AppData\Local\Temp\mlnet\NasBert2000000.tsm
[Source=NasBertTrainer; Ensuring model file is present., Kind=Info] NasBert2000000.tsm: Downloaded 3620 bytes out of 17907563
...

TorchSharp目前版本没有正式发布,例子运行问题多多,如上步骤放好数据文件后,直接运行出现
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Repeat 2 times:

at TorchSharp.torch+random.THSGenerator_manual_seed(Int64)

at TorchSharp.torch+random.manual_seed(Int64) ...错误

https://github.com/dotnet/machinelearning/issues/6669 按照这个说明设置也不对

ML.NET Version TorchSharp Package Version
2.0.0 0.98.1
2.0.1 0.98.1
3.0.0-preview 0.98.3
Next preview 0.99.5

按如上的设置版本也不对,仍是运行异常,那就进入TorchSharp的源码看看吧。

10 TorchSharp

初步的软件库结构
image.png

如上可见,在Microsoft.ML的大框架下【Microsoft.ML.Core.dll Microsoft.ML.dll Microsoft.ML.PCA.dll Microsoft.ML.Transforms.dll Microsoft.ML.Data.dll Microsoft.ML.KMeansClustering.dll Microsoft.ML.StandardTrainers.dll】
针对Torch,按照ML的框架结构,扩展了Microsoft.ML.TorchSharp这层【Microsoft.ML.TorchSharp.dll】,如下是其扩展的概览图

针对CPU/GPU 的场景,分别提供不同的库支持
TorchSharp TorchAudio TorchVision是使用C#语言实现的不同业务类别的库,这个库在pytorch的C语言库的基础上进行成抽象封装,为Microsoft.ML.TorchSharp提供服务;该工程使用C++语言提供了LibTorchSharp库【

】,供TorchSharp/TorchAudio/TorchVision来调用
LibTorchSharp最后使用P/Invoke的模式来调用 pytorch的c语言库
如下就是TorchSharp 工程封装C语言和打包的配置

相关工程的参考地址:
https://github.com/dotnet/TorchSharp
https://github.com/dotnet/TorchSharpExamples
https://www.nuget.org/packages/TorchSharp/
As we build up to a v1.0 release, we will continue to make breaking changes, but only when we consider it necessary for usability. Similarity to the PyTorch experience is a primary design tenet, and we will continue on that path.
如上可见TorchSharp由于未发布1.0,因此接口变化很快,而且官方仓库在nuget发布的的也没有分支和tag,兼容性问题较大
比如找到这个tag 如 https://github.com/dotnet/TorchSharpExamples/releases/tag/v0.95.4 这个的官方例子运行都有问题【
TorchSharpExamples-0.95.4.tar\TorchSharpExamples-0.95.4\src\CSharp\CSharpExamples

random.manual_seed(1); 这个直接不能访问,报异常
但取main分支,0.100.5的版本运行正常




回到当前的例子:








TorchSharp.dll!TorchSharp.torch.random.manual_seed(long seed) Line 21623
at TorchSharp\torch.cs(21623)

0.98.1的版本在 https://github.com/dotnet/TorchSharp 这个仓库没有分支和标签,这代码够乱的, https://www.nuget.org/packages/TorchSharp-cpu 不知道这个仓库发布的版本的代码来自哪里?
0.99.3的版本 https://github.com/dotnet/TorchSharp/releases/tag/v0.99.3 和Microsoft.ML.TorchSharp版本直接不兼容,运行缺少部分实现,估计版本迭代节奏差别大
找了个有分支的代码0.98.2,分析下

11 TorchSharp 0.98.2版本调试

https://github.com/dotnet/TorchSharp/tree/0.98.2 整个这个分支跟踪下,按照其DEVGUIDE.md的说明,使用devenv构建

msbuild TorchSharp.sln

msbuild TorchSharp.sln /p:Configuration=Release /p:Platform=x64

libtorch 的版本和 pytorch 是对应的,比如 libtorch 1.6.0 对应于 pytorch 1.6.0。

  • DEBUG模式版本

https://download.pytorch.org/libtorch/cpu/
https://download.pytorch.org/libtorch/cu113
https://download.pytorch.org/libtorch/cu113/libtorch-win-shared-with-deps-debug-1.11.0%2Bcu113.zip
https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-debug-1.11.0%2Bcpu.zip
下载后会将这些编译结果自动下载下来

libtorch-cpu\libtorch-win-shared-with-deps-debug-1.11.0%2Bcpu.zip 650M
libtorch-cuda-11.3\libtorch-win-shared-with-deps-debug-1.11.0%2Bcu113.zip 2.7G
文件很大
libtorch-cpu\libtorch-win-shared-with-deps-1.11.0%2Bcpu.zip 143M
libtorch-cuda-11.3\libtorch-win-shared-with-deps-1.11.0%2Bcu113.zip 2G
准备好后,直接vs中调试出现问题的函数,正常没问题

将例子代码拿过来,运行,也正常

  • Release模式版本

https://download.pytorch.org/libtorch/cu113/libtorch-win-shared-with-deps-1.11.0%2Bcu113.zip
https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-1.11.0%2Bcpu.zip

12 例子正常运行

经尝试,只要将如下libtorch的相关库:

例子默认的发布文件不能工作,更换后如下

初步估计是nuget中发布0.98.2版本可能哪里有不一致的地方
【具体的工程参考
https://gitee.com/iamops/x-unix-dotnet/blob/main/ml.net2/SentenceSimilarity/SentenceSimilarity.csproj
运行过程

13 小结

Torch的数据训练使用cpu进行速度的确很慢

SentenceSimilarity -官方的这个例子由于混合了c#和pytorch的c版本,由于torch这块的版本不稳定,使用比较麻烦。具体原因上述已分析。

标签:TorchSharp,ml,libtorch,例子,https,ML,net,com,Microsoft
From: https://www.cnblogs.com/2018/p/17915818.html

相关文章

  • 界面控件DevExpress v23.2全新发布 - 官宣正式支持.NET 8
    DevExpress拥有.NET开发需要的所有平台控件,包含600多个UI控件、报表平台、DevExpressDashboardeXpressApp框架、适用于VisualStudio的CodeRush等一系列辅助工具。屡获大奖的软件开发平台DevExpress今年第一个重要版本v23.1正式发布,该版本拥有众多新产品和数十个具有高影响力......
  • JavaScript中val()、html()、text()区别
    区别在前端开发中,val()、html()、text()三个方法都是用来获取或设置元素的内容。它们的区别在于:val()方法用于获取或设置表单元素的value属性的值。html()方法用于获取或设置元素的HTML内容,包括标签和文本。text()方法用于获取或设置元素的纯文本内容,不包括标签。事......
  • html的文档对象模型的基础操作
    可以理解为前端html中的节点,整个html页面由各种各样的文档对象模型组成本文中简单介绍文档对象模型的基础操作1.获取元素//通过ID获取元素letheaderElement=document.getElementById('header');//通过类名获取元素集合letparagraphs=document.getElementsByClassName(......
  • 使用网关采集modbus设备数据转换成profinet协议的方案
    1 方案描述这个方案是使用vfbox网关采集modbus设备的数据,然后转换成profinet协议发送给平台。这种转换方法只需要简单的配置网关参数,不需要进行软件编程,很方便的就把modbus数据转换成了profinet协议。在电脑上通过软件配置网关参数,告诉网关要采集的数据的寄存器地址,然后在配置一下......
  • Shadow DOM处理html渲染,样式隔离
    参考文章https://cloud.tencent.com/developer/article/1965869handleDetailData(){this.content=`<style>${markdown.style}</style>${this.content}`;//使用示例constcontainerElement=docume......
  • 经典卷积神经网络LeNet&AlexNet&VGG
    LeNetLeNet-5是一种经典的卷积神经网络结构,于1998年投入实际使用中。该网络最早应用于手写体字符识别应用中。普遍认为,卷积神经网络的出现开始于LeCun等提出的LeNet网络,可以说LeCun等是CNN的缔造者,而LeNet则是LeCun等创造的CNN经典之作网络结构图由下图所示: LeNet网络总共有......
  • Netty使用CompletableFuture实现异步串行队列
    一、前言CompletableFuture是JDK1.8提供的一种更加强大的异步编程的api。它实现了Future接口,也就是Future的功能特性CompletableFuture也有。它也实现了CompletionStage接口,CompletionStage接口定义了任务编排的方法,执行某一阶段,可以向下执行后续阶段。CompletableFuture相比于Futu......
  • .NET MVC 短信验证码过程
    原文链接:https://blog.csdn.net/weixin_44481764/article/details/979419841:创建一个项目用来调用第三方的类,右键Nuget添加第三方的引用类库qcloudsms_csharp2:把第三方的公共类放入到我们的项目里usingqcloudsms;usingqcloudsms_csharp.httpclient;usingqcloudsms_cshar......
  • html标签里面修改title样式
    默认的title不能设置样式,但我们可以通过js和css实现title的功能。CSS样式:<style>/*修改提示框*/#mytitle{position:absolute;color:#ffffff;max-width:160px;font-size:14px;padding:4px;background:rgba(......
  • WiMinet 评说1.2:多跳无线网络的困境
    1、前言    在工业应用中,低速率,大规模和长距离的无线自组织网络一直没有得到广泛的部署,根本原因在于其稳定性,可靠性和实时性一直无法得到良好的保证。在这种自组织网络中,节点之间的跳转关系大多是根据其相对位置和信号强度来决定的;由于安装位置,部署密度,启动时间等差异,其网......