{fastcluster}：快速分层聚类程序（Fast Hierarchical Clustering Routines）

时间：2024-02-16 23:15:09浏览次数：37

标签：Clustering fastcluster stats Routines hclust 聚类 ward method

1. 函数代码

该R包中最主要的函数是 hclust ，代码如下：

> fastcluster::hclust
function (d, method = "complete", members = NULL) 
{
    if (method == "ward") {
        message("The \"ward\" method has been renamed to \"ward.D\"; note new \"ward.D2\"")
        method <- "ward.D"
    }
    METHODS <- c("single", "complete", "average", "mcquitty", 
        "ward.D", "centroid", "median", "ward.D2")
    method <- pmatch(method, METHODS)
    if (is.na(method)) 
        stop("Invalid clustering method.")
    if (method == -1) 
        stop("Ambiguous clustering method.")
    dendrogram <- c(.Call(fastcluster, attr(d, "Size"), method, 
        d, members), list(labels = attr(d, "Labels"), method = METHODS[method], 
        call = match.call(), dist.method = attr(d, "method")))
    class(dendrogram) <- "hclust"
    return(dendrogram)
}

对比基础包 stats 中的函数 hclust ：

> stats::hclust
function (d, method = "complete", members = NULL) 
{
    METHODS <- c("ward.D", "single", "complete", "average", "mcquitty", 
        "median", "centroid", "ward.D2")
    if (method == "ward") {
        message("The \"ward\" method has been renamed to \"ward.D\"; note new \"ward.D2\"")
        method <- "ward.D"
    }
    i.meth <- pmatch(method, METHODS)
    if (is.na(i.meth)) 
        stop("invalid clustering method", paste("", method))
    if (i.meth == -1) 
        stop("ambiguous clustering method", paste("", method))
    n <- as.integer(attr(d, "Size"))
    if (is.null(n)) 
        stop("invalid dissimilarities")
    if (is.na(n) || n > 65536L) 
        stop("size cannot be NA nor exceed 65536")
    if (n < 2) 
        stop("must have n >= 2 objects to cluster")
    len <- as.integer(n * (n - 1)/2)
    if (length(d) != len) 
        (if (length(d) < len) 
            stop
        else warning)("dissimilarities of improper length")
    if (is.null(members)) 
        members <- rep(1, n)
    else if (length(members) != n) 
        stop("invalid length of members")
    storage.mode(d) <- "double"
    hcl <- .Fortran(C_hclust, n = n, len = len, method = as.integer(i.meth), 
        ia = integer(n), ib = integer(n), crit = double(n), members = as.double(members), 
        nn = integer(n), disnn = double(n), diss = d)
    hcass <- .Fortran(C_hcass2, n = n, ia = hcl$ia, ib = hcl$ib, 
        order = integer(n), iia = integer(n), iib = integer(n))
    structure(list(merge = cbind(hcass$iia[1L:(n - 1)], hcass$iib[1L:(n - 
        1)]), height = hcl$crit[1L:(n - 1)], order = hcass$order, 
        labels = attr(d, "Labels"), method = METHODS[i.meth], 
        call = match.call(), dist.method = attr(d, "method")), 
        class = "hclust")
}

二者的区别如下：

fastcluster::hclust 和 stats::hclust 函数都用于进行层次聚类，但它们在实现和性能上可能有差异。从 R 代码来看，它们在接口上非常相似，都需要一个距离矩阵 d 和一个指定方法 method 的参数。不过，这些函数在内部如何处理聚类过程可能有所不同。

fastcluster::hclust 是 fastcluster 包中的函数，这个包特别设计用于处理大数据集并优化性能。它提供了一个接口到 fastcluster 库，这是一个为速度优化的层次聚类算法的集合。如代码所示， fastcluster::hclust 使用 .Call 接口调用 C 语言级别的代码，这通常比 R 中的纯代码执行得更快。

stats::hclust 是 R 中 stats 包的一部分，提供了标准的层次聚类功能。代码显示它使用 .Fortran 调用 Fortran 代码来执行聚类。虽然 stats 包非常可靠并且在标准 R 安装中提供，但可能没有 fastcluster 包中的函数那么快，特别是在处理非常大的数据集时。

此外， fastcluster::hclust 在处理 "ward" 方法时，会发出警告信息指出 "ward" 方法已经更名为 "ward.D"（这里同时提到了一个新的 "ward.D2" 方法）。这表明 fastcluster 包在方法名称上可能更严格。其他方面，这两个函数的参数都很相似，例如 method ， members 等。

总的来说，两者的主要区别可能在于执行速度，尤其是在大规模数据集上。 fastcluster 在算法上可能更加高效，因此对于大型数据集可能是更好的选择。对于较小的数据集，或者当性能考虑不是首要问题时， stats::hclust 或许已经足够。然而，这两个函数在 API 设计上是非常类似的，提供了相似的用户体验。在选择使用哪一个时，可能需要根据你的数据规模和性能需求来决定。

标签：Clustering,fastcluster,stats,Routines,hclust,聚类,ward,method
From： https://www.cnblogs.com/Ixiaozhu/p/18017575

golang 死锁 all goroutines are asleep - deadlock!
channel死锁packagemainimport"fmt"funcmain(){ch:=make(chanstring)fori:=0;i<10;i++{s:=<-chfmt.Println(s)}gofunc(chchanstring){fori:=0;i<10;i++{......
Error: error:0308010C:digital envelope routines::unsupported
概述使用若依框架,启动UI执行命令npmrundev时报错误：Error:error:0308010C:digitalenveloperoutines::unsupportedINFOStartingdevelopmentserver...95%emittingCompressionPluginERRORError:error:0308010C:digitalenveloperoutines::unsupportedError:er......
[Go] Go routines with WaitGroup and async call
So,let'ssaywehaveafunctiontofetchcryptocurrenciesprice:packagemainimport( "fmt" "sync" "project/api")funcmain(){gogetCurrencyData("BTC")gogetCurrencyData("BCH")......
[ Go] GoRoutines and Channels
AgoroutineistheGowayofsuingthreads,weopenagoroutinejustbyinvokinganyfunctionwithagoprefix.gofunctionCall()Goroutinescancommunicatethroughchannels,anspecialtypeofvariable,achannelcontainsavalueofanykind,aroutinec......
NodeJs——error:03000086:digital envelope routines::initialization error
前言vue2前端项目在服务器上打包报错，发现是高版本的node使用的是OpenSSL3.0，导致的不兼容，所以先临时抛出下环境变量，继续使用老板本的OpenSSL的实现；步骤解决方法exportNODE_OPTIONS=--openssl-legacy-provider具体报错信息root@iZuf6f5trc95ufa25hqb6eZ:/www/wwwroot/ad-cl......
How to Master the Popular DBSCAN Clustering Algorithm for Machine Learning
OverviewDBSCANclusteringisanunderratedyetsuperusefulclusteringalgorithmforunsupervisedlearningproblemsLearnhowDBSCANclusteringworks,whyyoushouldlearnit,andhowtoimplementDBSCANclusteringinPythonIntroductionMasteringunsu......
Hierarchical Clustering-based Personalized Federated Learning for Robust and Fai
任务：人类活动识别任务HumanActivityRecognition----HAR指标：系统准确性、公平性、鲁棒性、可扩展性方法：1.提出一个带有层次聚类（针对鲁棒性和公平的HAR）个性化的FL框架FedCHAR；通过聚类（利用用户之间的内在相似关系）提高模型性能的准确性、公平性、鲁棒性。2.提高FedCHAR的......
论文笔记: Attributed Graph Clustering: A Deep Attentional Embedding Approach
论文笔记:AttributedGraphClustering:ADeepAttentionalEmbeddingApproach中文名称:属性图聚类：一种深度注意力嵌入方法论文链接:https://arxiv.org/abs/1906.06532背景: 图聚类是发现网络中的社区或群体的一项基本任务。最近的研究主要集中在开发深度学习方......
Error: error:0308010C:digital envelope routines::unsupported 【问题解决】【转载
原文链接： https://www.cnblogs.com/jaxu/p/17171211.html今天早上打开电脑，更新了日常工作的github仓库，然后就是习惯性地执行了"npminstall"，发现报了下面这个错误：Error:error:0308010C:digitalenveloperoutines::unsupported顺便看了一下错误堆栈，发现是一个Node......
记Redux下载后，运行examples/todos时，报错Error: error:0308010C:digital envelope rout
1、Redux下载下载地址gitclonehttps://github.com/reactjs/redux.git进入examples/todos，下载依赖：npminstall2、问题复现及解决执行命令npmrunstart此时终端报错：Error:error:0308010C:digitalenveloperoutines::unsupported解决方法：打开package.json，修改......

{fastcluster}：快速分层聚类程序（Fast Hierarchical Clustering Routines）

1. 函数代码

相关文章

赞助商

阅读排行