首页 > 编程语言 >Running Large Language Models locally – Your own ChatGPT-like AI in C#

Running Large Language Models locally – Your own ChatGPT-like AI in C#

时间:2023-10-12 10:25:11浏览次数:51  
标签:own like Language models LLaMA new model llama your

For the past few months, a lot of news in tech as well as mainstream media has been around ChatGPT, an Artificial Intelligence (AI) product by the folks at OpenAI. ChatGPT is a Large Language Model (LLM) that is fine-tuned for conversation. While undervaluing the technology with this statement, it’s a smart-looking chat bot that you can ask questions about a variety of domains.

Until recently, using these LLMs required relying on third-party services and cloud computing platforms. To integrate any LLM into your own application, or simply to use one, you’d have to swipe your credit card with OpenAI, Microsoft Azure, or others.

However, with advancements in hardware and software, it is now possible to run these models locally on your own machine and/or server.

In this post, we’ll see how you can have your very own AI powered by a large language model running directly on your CPU!

Towards open-source models and execution – A little bit of history…

A few months after OpenAI released ChatGPT, Meta released LLaMA. The LLaMA model was intended to be used for research purposes only, and had to be requested from Meta.

However, someone leaked the weights of LLaMA, and this has spurred a lot of activity on the Internet. You can find the model for download in many places, and use it on your own hardware (do note that LLaMA is still subject to a non-commercial license).

In comes Alpaca, a fine-tuned LLaMA model by Standford. And Vicuna, another fine-tuned LLaMA model. And WizardLM, and …

You get the idea: LLaMA spit up (sorry for the pun) a bunch of other models that are readily available to use.

While part of the community was training new models, others were working on making it possible to run these LLMs on consumer hardware. Georgi Gerganov released llama.cpp, a C++ implementation that can run the LLaMA model (and derivatives) on a CPU. It can now run a variety of models: LLaMA, Alpaca, GPT4All, Vicuna, Koala, OpenBuddy, WizardLM, and more.

There are also wrappers for a number of languages:

Let’s put the last one from that list to the test!

Getting started with SciSharp/LLamaSharp

Have you heard about the SciSharp Stack? Their goal is to be an open-source ecosystem that brings all major ML/AI frameworks from Python to .NET – including LLaMA (and friends) through SciSharp/LLamaSharp.

LlamaSharp is a .NET binding of llama.cpp and provides APIs to work with the LLaMA models. It works on Windows and Linux, and does not require you to think about the underlying llama.cpp. It does not support macOS at the time of writing.

Great! Now, what do you need to get started?

Since you’ll need a model to work with, let’s get that sorted first.

1. Download a model

LLamaSharp works with several models, but the support depends on the version of LLamaSharp you use. Supported models are linked in the README, do go explore a bit.

For this blog post, we’ll be using LLamaSharp version 0.3.0 (the latest at the time of writing). We’ll also use the WizardLM model, more specifically the wizardLM-7B.ggmlv3.q4_1.bin model. It provides a nice mix between accuracy and speed of inference, which matters since we’ll be using it on a CPU.

There are a number of more accurate models (or faster, less accurate models), so do experiment a bit with what works best. In any case, make sure you have 2.8 GB to 8 GB of disk space for the variants of this model, and up to 10 GB of memory.

2. Create a console application and install LLamaSharp

Using your favorite IDE, create a new console application and copy in the model you have just downloaded. Next, install the LLamaSharp and LLamaSharp.Backend.Cpu packages. If you have a Cuda GPU, you can also use the Cuda backend packages.

Here’s our project to start with:

LocalLLM project in JetBrains Rider

With that in place, we can start creating our own chat bot that runs locally and does not need OpenAI to run.

3. Initializing the LLaMA model and creating a chat session

In Program.cs, start with the following snippet of code to load the model that we just downloaded:

 



 

using LLama;

var model = new LLamaModel(new LLamaParams(
    model: Path.Combine("..", "..", "..", "Models", "wizardLM-7B.ggmlv3.q4_1.bin"),
    n_ctx: 512,
    interactive: true,
    repeat_penalty: 1.0f,
    verbose_prompt: false));

This snippet loads the model from the directory where you stored your downloaded model in the previous step. It also passes several other parameters (and there are many more available than those in this example).

For reference:

  • n_ctx – The maximum number of tokens in an input sequence (in other words, how many tokens can your question/prompt be).
  • interactive – Specifies you want to keep the context in between prompts, so you can build on previous results. This makes the model behave like a chat.
  • repeat_penalty – Determines the penalty for long responses (and helps keep responses more to-the-point).
  • verbose_prompt – Toggles the verbosity.

Again, there are many more parameters available, most of which are explained in the llama.cpp repository.

Next, we can use our model to start a chat session:

var session = new ChatSession<LLamaModel>(model)
    .WithPrompt(...)
    .WithAntiprompt(...);

Of course, these ... don’t compile, but let’s explain first what is needed for a chat session.

The .WithPrompt() (or .WithPromptFile()) method specifies the initial prompt for the model. This can be left empty, but is usually a set of rules for the LLM. Find some example prompts in the llama.cpp repository, or write your own.

The .WithAntiprompt() method specifies the anti-prompt, which is the prompt the LLM will display when input from the user is expected.

Here’s how to set up a chat session with an LLM that is Homer Simpson:

 



 

var session = new ChatSession<LLamaModel>(model)
    .WithPrompt("""
        You are Homer Simpson, and respond to User with funny Homer Simpson-like comments.

        User:
        """)
    .WithAntiprompt(new[] { "User: " });

We’ll see in a bit what results this Homer Simpson model gives, but generally you will want to be more detailed in what is expected from the LLM. Here’s an example chat session setup for a model called “LocalLLM” that is helpful as a pair programmer:

 1

 

 var session = new ChatSession<LLamaModel>(model)
 2     .WithPrompt("""
 3         You are a polite and helpful pair programming assistant.
 4         You MUST reply in a polite and helpful manner.
 5         When asked for your name, you MUST reply that your name is 'LocalLLM'.
 6         You MUST use Markdown formatting in your replies when the content is a block of code.
 7         You MUST include the programming language name in any Markdown code blocks.
 8         Your code responses MUST be using C# language syntax.
 9 
10         User:
11         """)
12     .WithAntiprompt(new[] { "User: " });
using System.IO.Compression;

标签:own,like,Language,models,LLaMA,new,model,llama,your
From: https://www.cnblogs.com/largeprob/p/17758855.html

相关文章

  • unknown or unsupported macOS version: :dunno (MacOSVersionError)
    在安装libimobiledevice报错如下unknownorunsupportedmacOSversion::dunno(MacOSVersionError)主要原因是我禁用了brew自动更新脚本如下#HomebrewSettingsexportPATH="$PATH:$HOME/.rvm/bin"exportHOMEBREW_NO_AUTO_UPDATE=true[[-s"$HOME/.rvm/scr......
  • It's likely that neither a Result Type nor a Result Map was specified.
    It'slikelythatneitheraResultTypenoraResultMapwasspecified.很可能既没有指定结果类型也没有指定结果映射。出现问题的代码:本段代码功能是查询一张表的全部点击查看代码<mappernamespace="com.ding.dao.RoleDao"><!--用于select查询公用抽取的列-->......
  • 私有云盘搭建之OwnCloud的安装与服务配置​
    由于公司最近想要搞个内部的网盘,本着节约成本的原则,不断尝试不同的开源平台,其中一个比较适合的就是owncloud。安装OwnCloudOwnCloud是一款开源的云存储软件,适合作为内部网盘使用。虽然没有用户注册功能,但是有用户添加功能,你可以无限制地添加用户,并且支持多个平台使用。1.执行以下......
  • 如何优雅地编写带数学公式的文章?(markdown+latex)
    如何优雅地编写带数学公式的文章?(markdown+latex)一千个读者眼里有一千个哈姆雷特,我见过用word编辑公式的同学,也有人用奇怪的符号组合来表示公式,当然最多的还是用latex编写这一类文章,但是就便利和美观的折中选择来说,本人认为用markdown+latex肯定是最好的选择,本文的写作方法就......
  • Error: Failed to download metadata for repo 'appstream': Cannot prepare internal
    一背景跑了一份centos容器,想装一下net-tools,报如下错误Error:Failedtodownloadmetadataforrepo'appstream':Cannotprepareinternalmirrorlist:NoURLsinmirrorlist 二解决参考帖子:https://developer.aliyun.com/article/1165954  CentOS已经停止......
  • Go - Flipping an Image Upside Down
    Problem: Youwanttoflipanimageupsidedown.Solution: Convertanimagetoagridofpixels.Swapthepositionsofthetopandbottompairsofpixelsandmovedowntoswapthenextpairuntilallthepixelpositionsareswapped.Convertthegridofpi......
  • The designer cannot be shown because the document for it was never loaded. (Visua
    Thedesignercannotbeshownbecausethedocumentforitwasneverloaded. 查看窗体设计器报错   查看代码是空的   多做代码备份。 REFhttps://stackoverflow.com/questions/7688630/the-designer-could-not-be-shown-for-this-file-because-none-of-the-classes-wi......
  • MarkDown编辑器常用语法
    1、字体、字号与颜色<fontface="黑体">我是黑体字</font><fontface="微软雅黑">我是微软雅黑</font><fontface="STCAIYUN">我是华文彩云</font><fontcolor=#0099ffsize=7face="黑体">color=#0099ffsize=72face=&q......
  • markdown语法
    推荐使用软件:typora#一级标题###二级标题#####三级标题###**加粗***斜体*~~删除线~~>引用别人的文章,用大于号就行---分割线***分割线图片                        ![截图]()               ......
  • 影刀引用Python模块实现html转为markdown
    比如,我获取了一段html的文本,想要转成markdown格式,可以参考下面用法1.引入一个html2text的包,右键可以看到python包管理器2.插入python代码,转换变量有疑问加站长微信联系(非本文作者)......