首页 > 编程语言 >mlpack is an intuitive, fast, and flexible header-only C++ machine learning library

mlpack is an intuitive, fast, and flexible header-only C++ machine learning library

时间:2023-09-30 09:44:52浏览次数:41  
标签:intuitive make mlpack fast only Armadillo build install bindings

https://github.com/mlpack/mlpack

 

README.md

 


a fast, header-only machine learning library

Home | Documentation | Community | Help | IRC Chat 

Azure DevOps builds (job) License NumFOCUS

Download: current stable version (4.2.1)

mlpack is an intuitive, fast, and flexible header-only C++ machine learning library with bindings to other languages. It is meant to be a machine learning analog to LAPACK, and aims to implement a wide array of machine learning methods and functions as a "swiss army knife" for machine learning researchers.

mlpack's lightweight C++ implementation makes it ideal for deployment, and it can also be used for interactive prototyping via C++ notebooks (these can be seen in action on mlpack's homepage).

In addition to its powerful C++ interface, mlpack also provides command-line programs, Python bindings, Julia bindings, Go bindings and R bindings.

Quick links:

mlpack uses an open governance model and is fiscally sponsored by NumFOCUS. Consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.


0. Contents

  1. Citation details
  2. Dependencies
  3. Installing and using mlpack in C++
  4. Building mlpack bindings to other languages
    1. Command-line programs
    2. Python bindings
    3. R bindings
    4. Julia bindings
    5. Go bindings
  5. Building mlpack's test suite
  6. Further resources

1. Citation details

If you use mlpack in your research or software, please cite mlpack using the citation below (given in BibTeX format):

@article{mlpack2023,
    title     = {mlpack 4: a fast, header-only C++ machine learning library},
    author    = {Ryan R. Curtin and Marcus Edel and Omar Shrit and 
                 Shubham Agrawal and Suryoday Basak and James J. Balamuta and 
                 Ryan Birmingham and Kartik Dutt and Dirk Eddelbuettel and 
                 Rishabh Garg and Shikhar Jaiswal and Aakash Kaushik and 
                 Sangyeon Kim and Anjishnu Mukherjee and Nanubala Gnana Sai and 
                 Nippun Sharma and Yashwant Singh Parihar and Roshan Swain and 
                 Conrad Sanderson},
    journal   = {Journal of Open Source Software},
    volume    = {8},
    number    = {82},
    pages     = {5026},
    year      = {2023},
    doi       = {10.21105/joss.05026},
    url       = {https://doi.org/10.21105/joss.05026}
}
 

Citations are beneficial for the growth and improvement of mlpack.

2. Dependencies

mlpack requires the following additional dependencies:

If the STB library headers are available, image loading support will be available.

If you are compiling Armadillo by hand, ensure that LAPACK and BLAS are enabled.

3. Installing and using mlpack in C++

See also the C++ quickstart.

Since mlpack is a header-only library, installing just the headers for use in a C++ application is trivial.

From the root of the sources, configure and install in the standard CMake way:

mkdir build && cd build/
cmake ..
sudo make install
 

If the cmake .. command fails due to unavailable dependencies, consider either using the -DDOWNLOAD_DEPENDENCIES=ON option as detailed in the following subsection, or ensure that mlpack's dependencies are installed, e.g. using the system package manager. For example, on Debian and Ubuntu, all relevant dependencies can be installed with sudo apt-get install libarmadillo-dev libensmallen-dev libcereal-dev libstb-dev g++ cmake.

Alternatively, since CMake v3.14.0 the cmake command can create the build folder itself, and so the above commands can be rewritten as follows:

cmake -S . -B build
sudo cmake --build build --target install
 

During configuration, CMake adjusts the file mlpack/config.hpp using the details of the local system. This file can be modified by hand as necessary before or after installation.

3.1. Additional build options

You can add a few arguments to the cmake command to control the behavior of the configuration and build process. Simply add these to the cmake command. Some options are given below:

  • -DDOWNLOAD_DEPENDENCIES=ON will automatically download mlpack's dependencies (ensmallen, Armadillo, and cereal). Installing Armadillo this way is not recommended and it is better to use your system package manager when possible (see below).
  • -DCMAKE_INSTALL_PREFIX=/install/root/ will set the root of the install directory to /install/root when make install is run.
  • -DDEBUG=ON will enable debugging symbols in any compiled bindings or tests.

There are also options to enable building bindings to each language that mlpack supports; those are detailed in the following sections.

Once headers are installed with make install, using mlpack in an application consists only of including it. So, your program should include mlpack:

#include <mlpack.hpp>
 

and when you link, be sure to link against Armadillo. If your example program is my_program.cpp, your compiler is GCC, and you would like to compile with OpenMP support (recommended) and optimizations, compile like this:

g++ -O3 -std=c++14 -o my_program my_program.cpp -larmadillo -fopenmp
 

Note that if you want to serialize (save or load) neural networks, you should add #define MLPACK_ENABLE_ANN_SERIALIZATION before including <mlpack.hpp>. If you don't define MLPACK_ENABLE_ANN_SERIALIZATION and your code serializes a neural network, a compilation error will occur.

See the C++ quickstart and the examples repository for some examples of mlpack applications in C++, with corresponding Makefiles.

3.1.a. Linking with autodownloaded Armadillo

When the autodownloader is used to download Armadillo (-DDOWNLOAD_DEPENDENCIES=ON), the Armadillo runtime library is not built and Armadillo must be used in header-only mode. The autodownloader also does not download dependencies of Armadillo such as OpenBLAS. For this reason, it is recommended to instead install Armadillo using your system package manager, which will also install the dependencies of Armadillo. For example, on Ubuntu and Debian systems, Armadillo can be installed with

sudo apt-get install libarmadillo-dev
 

and other package managers such as dnf and brew and pacman also have Armadillo packages available.

If the autodownloader is used to provide Armadillo, mlpack programs cannot be linked with -larmadillo. Instead, you must link directly with the dependencies of Armadillo. For example, on a system that has OpenBLAS available, compilation can be done like this:

g++ -O3 -std=c++14 -o my_program my_program.cpp -lopenblas -fopenmp
 

See the Armadillo documentation for more information on linking Armadillo programs.

3.2. Reducing compile time

mlpack is a template-heavy library, and if care is not used, compilation time of a project can be increased greatly. Fortunately, there are a number of ways to reduce compilation time:

  • Include individual headers, like <mlpack/methods/decision_tree.hpp>, if you are only using one component, instead of <mlpack.hpp>. This reduces the amount of work the compiler has to do.

  • Only use the MLPACK_ENABLE_ANN_SERIALIZATION definition if you are serializing neural networks in your code. When this define is enabled, compilation time will increase significantly, as the compiler must generate code for every possible type of layer. (The large amount of extra compilation overhead is why this is not enabled by default.)

  • If you are using mlpack in multiple .cpp files, consider using extern templates so that the compiler only instantiates each template once; add an explicit template instantiation for each mlpack template type you want to use in a .cpp file, and then use extern definitions elsewhere to let the compiler know it exists in a different file.

Other strategies exist too, such as precompiled headers, compiler options, ccache, and others.

4. Building mlpack bindings to other languages

mlpack is not just a header-only library: it also comes with bindings to a number of other languages, this allows flexible use of mlpack's efficient implementations from languages that aren't C++.

In general, you should not need to build these by hand---they should be provided by either your system package manager or your language's package manager.

Building the bindings for a particular language is done by calling cmake with different options; each example below shows how to configure an individual set of bindings, but it is of course possible to combine the options and build bindings for many languages at once.

4.i. Command-line programs

See also the command-line quickstart.

The command-line programs have no extra dependencies. The set of programs that will be compiled is detailed and documented on the command-line program documentation page.

From the root of the mlpack sources, run the following commands to build and install the command-line bindings:

mkdir build && cd build/
cmake -DBUILD_CLI_PROGRAMS=ON ../
make
sudo make install
 

You can use make -j<N>, where N is the number of cores on your machine, to build in parallel; e.g., make -j4 will use 4 cores to build.

4.ii. Python bindings

See also the Python quickstart.

mlpack's Python bindings are available on PyPI and conda-forge, and can be installed with either pip install mlpack or conda install -c conda-forge mlpack. These sources are recommended, as building the Python bindings by hand can be complex.

With that in mind, if you would still like to manually build the mlpack Python bindings, first make sure that the following Python packages are installed:

  • setuptools
  • wheel
  • cython >= 0.24
  • numpy
  • pandas >= 0.15.0

Now, from the root of the mlpack sources, run the following commands to build and install the Python bindings:

mkdir build && cd build/
cmake -DBUILD_PYTHON_BINDINGS=ON ../
make
sudo make install
 

You can use make -j<N>, where N is the number of cores on your machine, to build in parallel; e.g., make -j4 will use 4 cores to build. You can also specify a custom Python interpreter with the CMake option -DPYTHON_EXECUTABLE=/path/to/python.

4.iii. R bindings

See also the R quickstart.

mlpack's R bindings are available as the R package mlpack on CRAN. You can install the package by running install.packages('mlpack'), and this is the recommended way of getting mlpack in R.

If you still wish to build the R bindings by hand, first make sure the following dependencies are installed:

  • R >= 4.0
  • Rcpp >= 0.12.12
  • RcppArmadillo >= 0.9.800.0
  • RcppEnsmallen >= 0.2.10.0
  • roxygen2
  • testthat
  • pkgbuild

These can be installed with install.packages() inside of your R environment. Once the dependencies are available, you can configure mlpack and build the R bindings by running the following commands from the root of the mlpack sources:

mkdir build && cd build/
cmake -DBUILD_R_BINDINGS=ON ../
make
sudo make install
 

You may need to specify the location of the R program in the cmake command with the option -DR_EXECUTABLE=/path/to/R.

Once the build is complete, a tarball can be found under the build directory in src/mlpack/bindings/R/, and then that can be installed into your R environment with a command like install.packages(mlpack_3.4.3.tar.gz, repos=NULL, type='source').

4.iv. Julia bindings

See also the Julia quickstart.

mlpack's Julia bindings are available by installing the mlpack.jl package using Pkg.add("mlpack.jl"). The process of building, packaging, and distributing mlpack's Julia bindings is very nontrivial, so it is recommended to simply use the version available in Pkg, but if you want to build the bindings by hand anyway, you can configure and build them by running the following commands from the root of the mlpack sources:

mkdir build && cd build/
cmake -DBUILD_JULIA_BINDINGS=ON ../
make
 

If CMake cannot find your Julia installation, you can add -DJULIA_EXECUTABLE=/path/to/julia to the CMake configuration step.

Note that the make install step is not done above, since the Julia binding build system was not meant to be installed directly. Instead, to use handbuilt bindings (for instance, to test them), one option is to start Julia with JULIA_PROJECT set as an environment variable:

cd build/src/mlpack/bindings/julia/mlpack/
JULIA_PROJECT=$PWD julia
 

and then using mlpack should work.

4.v. Go bindings

See also the Go quickstart.

To build mlpack's Go bindings, ensure that Go >= 1.11.0 is installed, and that the Gonum package is available. You can use go get to install mlpack for Go:

go get -u -d mlpack.org/v1/mlpack
cd ${GOPATH}/src/mlpack.org/v1/mlpack
make install
 

The process of building the Go bindings by hand is a little tedious, so following the steps above is recommended. However, if you wish to build the Go bindings by hand anyway, you can do this by running the following commands from the root of the mlpack sources:

mkdir build && cd build/
cmake -DBUILD_GO_BINDINGS=ON ../
make
sudo make install
 

5. Building mlpack's test suite

mlpack contains an extensive test suite that exercises every part of the codebase. It is easy to build and run the tests with CMake and CTest, as below:

mkdir build && cd build/
cmake -DBUILD_TESTS=ON ../
make
ctest .
 

If you want to test the bindings, too, you will have to adapt the CMake configuration command to turn on the language bindings that you want to test---see the previous sections for details.

6. Further Resources

More documentation is available for both users and developers.

User documentation:

Tutorials:

Developer documentation:

To learn about the development goals of mlpack in the short- and medium-term future, see the vision document.

If you have problems, find a bug, or need help, you can try visiting the mlpack help page, or mlpack on Github. Alternately, mlpack help can be found on Matrix at #mlpack; see also the community page.

 

标签:intuitive,make,mlpack,fast,only,Armadillo,build,install,bindings
From: https://www.cnblogs.com/ztguang/p/17737628.html

相关文章

  • Go每日一库之180:fastcache(协程安全且支持大量数据存储的高性能缓存库)
    fastcache是一个线程安全并且支持大量数据存储的高性能缓存组件库。这是官方Github主页上的项目介绍,和fasthttp名字一样以fast打头,作者对项目代码的自信程度可见一斑。此外该库的核心代码非常轻量,笔者本着学习的目的分析下内部的代码实现。基准测试官方给出了fastca......
  • FastAPI学习-22.response 异常处理 HTTPException
    前言某些情况下,需要向客户端返回错误提示。这里所谓的客户端包括前端浏览器、其他应用程序、物联网设备等。需要向客户端返回错误提示的场景主要如下:客户端没有执行操作的权限客户端没有访问资源的权限客户端要访问的项目不存在等等...遇到这些情况时,通常要返回 4XX(40......
  • FastAPI学习-23.异常处理器 exception_handler
    前言通常我们可以通过raise抛出一个HTTPException异常,请求参数不合法会抛出RequestValidationError异常,这是最常见的2种异常。HTTPException异常向客户端返回HTTP错误响应,可以使用 raise触发 HTTPException。fromfastapiimportFastAPI,HTTPExceptionapp=Fa......
  • FastAPI学习-24.自定义异常处理器 exception_handler
    前言添加自定义处理器,要使用 Starlette的异常工具。安装自定义异常处理器假设要触发的自定义异常叫作 UnicornException。且需要FastAPI实现全局处理该异常。此时,可以用 @app.exception_handler() 添加自定义异常控制器:fromfastapiimportFastAPI,Requestfromfa......
  • FastAPI学习-25.response_model 定义响应模型
    你可以在任意的_路径操作_中使用 response_model 参数来声明用于响应的模型:@app.get()@app.post()@app.put()@app.delete()fromtypingimportAny,List,UnionfromfastapiimportFastAPIfrompydanticimportBaseModelapp=FastAPI()classItem(BaseModel)......
  • FastDFS 简介
    FastDFS简介FastDFS是一款开源的分布式文件系统,功能主要包括:文件存储、文件同步、文件访问(文件上传、文件下载)等,解决了文件大容量存储和高性能访问的问题。FastDFS特别适合以文件为载体的在线服务,如图片、视频、文档等等服务。FastDFS作为一款轻量级分布式文件系统,版本V6.01代......
  • FastDFS--扩展篇(Php&&Apache2&&Nginx)
         FastDFS不是通用的文件系统,只能通过专用的API来访问,目前提供了CJAVAPHP的API,下面我们来安装php扩展。   让Fastdfs支持php,在FastDFS的源码包解压后里面有个php_client目录,进入此目录,参照README进行安装: phpize./configuremakemakeinstall    ......
  • RCNN、FastRCNN、FasterRCNN、Mask-RCNN的发展历程
    FasterR-CNN的发展史SelectiveSearch(2012)RCNN(2014)SPPNet(2014)FastR-CNN(2015)FasterR-CNN(2015)总结补充:MaskRCNN7.1、FPN7.2、RPN7.3、ProposalLayer层7.4、DetectionTarget层(预测框匹配groundTrue)7.5、ROIAlign    8.补充:历年主流网络型对比、常用......
  • fastjson parseobject typereference - 指定泛型类型
    Fastjson是一种Java中非常流行的JSON解析库,它可以将JSON字符串转换为Java对象。其中,parseObject方法是Fastjson提供的一种将JSON字符串解析为Java对象的方法,而TypeReference是Java泛型中的一个类,可以用于指定泛型类型,例如List、Map<String,Integer>等等。当我们使用parseObject......
  • [JSON|序列化] fastjson自定义字段命名规则 (转发)
    1序言博主本人近期也遇到了基于fatsjson自定义命名字段规则的问题,为加强对此的学习和记忆,故转发这篇博文。博主本人最终采取的方法21.1前置知识fastjson在将对象转变为JSON字符串时,字段默认使用CamelCase规则命名。在1.2.15版本之后,fastjson支持配置Proper......