首页 > 其他分享 >SciTech-Mathematics-Probability+Statistics-Dot products, cosine similarity, text vectors

SciTech-Mathematics-Probability+Statistics-Dot products, cosine similarity, text vectors

时间:2024-07-19 21:20:06浏览次数:15  
标签:vector Statistics Probability similarity text vectors cosine norm

Dot products, cosine similarity, text vectors

https://dev.to/sayemmh/dot-products-cosine-similarity-text-vectors-2lo4

Sayem Hoque, Posted on Oct 20, 2022

Dot products, cosine similarity, text vectors
Cosine similarity is a measure between two single dimensional vectors that gives us a value ranging 0-1 to inform of the similarity between the vectors. The formula is below:
Cosine Similarity = (A . B) / (||A||.||B||)
Where (A . B) is the dot product between vector A and B. A dot product is the sum of the element-by-element product between A and B. For example,

A = [1, 2, 3]
B = [4, 5, 6]


A . B
>> 32
# (1 * 4) + (2 * 5) + (3 * 6) = 32

Meanwhile, ||A|| is the notation used to denote the L2 Norm of a vector. The L2 norm is a method to calculate the length of a vector in Euclidean space. Think of this as the length of a vector of length N as a "line" if the vector was drawn out on a N-dimensional graph. You sum the squares of the values in each dimension, and take the square root of the sum.

A = [1, 2, 3]

norm(A)

>> 3.7416573
# (1^2 + 2^2 + 3^2)^0.5 = 3.7416573

Numpy has a bunch of helpers so we don't need to run all of these calculations manually:

import numpy as np
from numpy.linalg import norm

# define two lists or array
A = np.array([1,2,3,4])
B = np.array([1,2,3,5])

# cosine similarity
cosine = np.dot(A, B) / (norm(A) * norm(B))
print("cosine similarity:", cosine)

>> 0.9939990885479664

A cosine similarity score near 1 means the vectors are very close to one another if they were projected

标签:vector,Statistics,Probability,similarity,text,vectors,cosine,norm
From: https://www.cnblogs.com/abaelhe/p/18312377

相关文章

  • SciTech-Mathematics-Probability+Statistics-Descriptive stats + percentiles in nu
    DescriptiveStats+percentilesinnumpyandscipy.statshttps://dev.to/sayemmh/descriptive-stats-percentiles-in-numpy-and-scipystats-59a7DEVCommunitySayemHoque,PostedonOct13,2022•UpdatedonNov16,2022Descriptivestats+percentilesinnumpy......
  • SciTech-Mathmatics-Statistics-NumPy and Statistics: Descriptive Statistics
    StatisticsFromNumPyOfficialDocs.https://numpy.org/doc/stable/reference/routines.statistics.htmlOrderstatisticsnumpy.percentilenumpy.percentile(a,q,axis=None,out=None,overwrite_input=False,method='linear',keepdims=False,*,weig......
  • MATH1041 Statistics for Life
    MATH1041 Statistics for Life and Social SciencesTerm 2, 2024MATH1041 AssignmentData:  Together with this document, you should have received your unique dataset in an e-mail sent to your  official university email address. The......
  • SciTech-POLIR-Statistics-重要的统计数据来源 与 数据建模
    重要数据来源:官员与人口分布建模:9000万“党员”方可称为“人民”,因为党员才有“参政议政”资格;其他13亿多“群众”大多数是“经营、生产与劳动”,无法(或只有极少数)“参政议政”被“代表”.对于“政事”,只在“人民”(党员)的组织内决策、分级授权与分发及公布?因为全......
  • ArcPy|使用ArcPy实现区域统(ZonalStatisticsAsTable)计并将统计结果添加到原有要素的属
    ArcPy|使用ArcPy实现按区域统计并将统计结果添加到原有要素的属性表​ 使用ArcPy快速实现按区域统计,并将统计结果添加到原有属性表,因为ArcPy的接口中并没有直接添加这一选项,这里是选择输出了一个中间dbf文件,再将中间dbf文件与原有要素连接来实现。#-*-coding:utf-8-*-"""P......
  • Linux命令 (network statistics -all numeric programs | Global Regular Expression P
    文章目录1、第一种解释2、第二种解释3、第三种解释4、第四种解释5、第五种解释6、netstat--help在Windows中,杀死端口占用的博客链接在Linux中,grep的英文全称是GlobalRegularExpressionPrint全局正则表达式打印。它用于在文本中搜索与指定模式匹配的行,并将这......
  • 【Python快速上手(三十)】- 详解Python random 模块和 statistics 模块
    目录Python快速上手(三十)-详解Pythonrandom模块和statistics模块1.Pythonrandom模块1.1生成随机数1.2随机选择和打乱1.3随机分布1.4种子和状态2.Pythonstatistics模块2.1均值和中位数2.2众数2.3方差和标准差2.4协方差和相关性2.5分位数和百分位数2.6......
  • AoPS - Chapter 19 Probability
    本章介绍了一些概率的基本概念与条件概率。独立与互斥Twoeventsarecalleduncorrelated(orindependent)(独立)iftheyhavenobearingoneachother.\[P(A\capB)=P(A)\timesP(B)\]Twoeventsarecalledmutuallyexclusive(互斥)ifbotheventscannotsimultaneou......
  • SciTech-Mathmatics-ProbabilitiesAndStatistics-Distribution-is-all-you-need: 概率
    Distribution-is-all-you-need概率统计到深度学习,四大技术路线图谱,都在这里!https://github.com/graykode/distribution-is-all-you-need自然语言处理路线图:数学基础->语言基础->模型和算法项目作者:Tae-HwanJung,Github:graykode,2019-09-3013:35,选自Github自然......
  • SciTech-Statistics-英语授课:Business Statistics商务统计
    WhatIsaPopulationParameter?ByAlaneLim,AlaneLim,ScienceExpertPh.D.,MaterialsScienceandEngineering,NorthwesternUniversityB.A.,Chemistry,JohnsHopkinsUniversityB.A.,CognitiveScience,JohnsHopkinsUniversityAlaneLimholdsaPh.D.......