首页 > 其他分享 >CIS5200: Machine Learning

CIS5200: Machine Learning

时间:2024-10-30 20:30:56浏览次数:1  
标签:function CIS5200 given Machine rate points Learning need your

CIS5200: Machine Learning

Fall 2024Homework 2Release Date: October 9, 2024 Due Date: October 18, 2024

  • HW2 will count for 10% of the grade. This grade will be split between the written (30 points)and programming (40 points) parts.
  • All written homework solutions are required to be formatted using LATEX. Please use theemplate here. Do not modify the template. This is a good resource to get yourself morefamiliar with LATEX, if you are still not comfortable.
  • You will submit your solution for the written part of HW2 as a single PDF file via Gradescope.The deadline is 11:59 PM ET. Contact TAs on Ed if you face any issues uploading yourhomeworks.
  • Collaboration is permitted and encouraged for this homework, though each student mustunderstand, write, and hand in their own submission. In particular, it is acceptable forstudents to discuss problems with each other; it is not acceptable for students to look at

another student’s written Solutions when writing their own. It is also not acceptable topublicly post your (partial) solution on Ed, but you are encouraged to ask public questionson Ed. If you choose to collaborate, you must indicate on each homework with whom you

collaborated.Please refer to the notes and slides posted on the website if you need to recall the material discussedin the lectures.1 Written Questions (30 points)

Problem 1: Gradient Descent (20 points) Consider a training dataset S = {(x1, y1), . . . ,(xm, ym)} where for all i [m], xi2 1 andyi ∈ {−1, 1}. Suppose we want torun regularized logistic regression, that is, solve the followingoptimization problem: for regularization term R(w),tiable function f is µ-strongly convex, it suffices to show that the hessian satisfies: 2f µI. Similarly to show hat a twice differentiable function f is L-smooth, it suffices to show that the hessian satisfies: LI ⪰ ∇2f. Here I is the identity matrix of the appropriate dimension.

11.1 (3 points) In the case where R(w) = 0, we know that the objective is convex. Is it stronglyconvex? Explain your answer.1.2 (3 points) In the case where R(w) = 0, show that the objective is 1-smooth.

1.3 (4 points) In the case of R(w) = 0, what is the largest learning rate that you can choose suchthat the objective is non-increasing at each iteration? Explain your answer.Hint: The answer is not 1/L for a L-smooth function.

1.4 (1 point) What is the convergence rate of gradient代写CIS5200: Machine Learning  descent on this problem with R(w) = 0?In other words, suppose I want to achieve F(wT +1) F(w) ϵ, express the number of iterationsT that I need to run GD for.Note: You do not need to reprove the convergence guarantee, just use the guarantee to provide the rate.

1.5 (5 points) Consider the following variation of the 2 norm regularizer called the weighted 2

1.6 (4 points) If a function is µ-strongly convex and L-smooth, after T iterations of gradientdescent we have:wT +1 w2 2  1 L µ  T w1 w2 2 . Using the above, what is the convergence rate of gradient descent on the regularized logistic re

gression problem with the weighted 2 norm penalty? In other words, suppose I want to achieve

wT +1 w2 ϵ, express the number of iterations T that I need to run GD.

Note: You do not need to prove the given convergence guarantee, just provide the rate.

Problem 2: MLE for Linear Regression (10 points)

In this question, you are going to derive an alternative justification for linear regression via thesquared loss. In particular, we will show that linear regression via minimizing the squared loss isequivalent to maximum likelihood estimation (MLE) in the following statistical model.Assume that for given x, there exists a true linear function parameterized by w so that the label y is generated randomly asy = w x + ϵ 2where ϵ ∼ N (0, σ2 ) is some normally distributed noise with mean 0 and variance σ 2 > 0. In otherwords, the labels of your data are equal to some truelinearfunction, plus Gaussian noise aroundthat line.

2.1 (3 points) Show that the above model implies that the conditional density of y given x isHint: Use the density function of the normal distribution, or the fact that adding a constant to a Gaussian random variable shifts the mean by that constant.

2.2 (2 points) Show that the risk of the predictor f(x) = E[y|x] is σ 2 , that is,R(f) = Ex,y  (y f(x))2  = σ 2 .

2.3 (3 points) The likelihood for the given data {(x1, y1), . . . ,(xm, ym)} is given by.(w, σ) = p(y1, . . . , ym|x1, . . . , xm) =mYi=1

p(yi |xi). Compute the log conditional likelihood, that is, log Lˆ(w, σ).Hint: Use your expression for p(y | x) from part 2.1.

2.4 (2 points) Show that the maximizer of log Lˆ(w, σ) is the same as the minimizer of the empiricalrisk with squared loss,ˆR(w) = m 1 P m i=1(yi w xi) 2 .Hint: Take the derivative of your result from 2.3 and set it equal to zero.

2 Programming Questions (20 points)

Use the link here to access the Google Colaboratory (Colab) file for this homework. Be sure tomake a copy by going to “File”, and “Save a copy in Drive”. As with the previous homeworks, thisassignment uses the PennGrader system for students to receive immediate feedback. As noted onthe notebook, please be sure to change the student ID from the default ‘99999999’ to your 8-digitPennID.Instructions for how to submit the programming component of HW 2 to Gradescope are includedin the Colab notebook. You may find this PyTorch linear algebra reference and this generalPyTorch reference to be helpful in perusing the documentation and finding useful functions foryour implementation.3

标签:function,CIS5200,given,Machine,rate,points,Learning,need,your
From: https://www.cnblogs.com/CSSE2310/p/18516499

相关文章

  • Paper Reading: Mixed Bagging: A Novel Ensemble Learning Framework for Supervised
    目录研究动机文章贡献本文方法分组混合Bagging增量混合Bagging实验结果本文方法的设置数据集和实验设置对比实验优点和创新点PaperReading是从个人角度进行的一些总结分享,受到个人关注点的侧重和实力所限,可能有理解不到位的地方。具体的细节还需要以原文的内容为准,博客中的图......
  • 《FashionViL: Fashion-Focused Vision-and-Language Representation Learning》中文
    文章汉化系列目录文章目录文章汉化系列目录摘要1引言2相关工作3方法论3.1模型概述3.2预训练任务4实验*4.1预训练数据集和下游任务4.2比较结果4.3消融研究4.4可视化5结论摘要 大规模视觉-语言(V+L)表示学习的预训练已被证明在提升各种下游V+L任务上非......
  • 【论文笔记】C$^2$RL: Content and Context Representation Learning for Gloss-free
    ......
  • Paper Reading: Random Balance ensembles for multiclass imbalance learning
    目录研究动机文章贡献多分类的随机平衡集成二分类RandomBalanceMultiRandBalOVO-RandBal和OVA-RandBal实验结果数据集和实验设置对比实验MAUC的分解多样性运行时间优点和创新点PaperReading是从个人角度进行的一些总结分享,受到个人关注点的侧重和实力所限,可能有理解不到位......
  • 直观解释注意力机制,Transformer的核心 | Chapter 6 | Deep Learning | 3Blue1Brown
    目录前言1.前情提要:词嵌入2.注意力是什么?Mole是什么?Tower又是什么?3.注意力模式:“一个毛茸茸的蓝色生物漫步于葱郁的森林”,名词与形容词,查询与键4.掩码:看前不看后5.上下文窗口大小6.值矩阵7.参数有多少8.交叉注意力9.多头注意力10.输出矩阵11.加深网络12.结语......
  • 直观解释大语言模型如何储存事实 | Chapter 7 | Deep Learning | 3Blue1Brown
    目录前言1.大语言模型中的事实储存在哪里?2.快速回顾一下Transformer3.针对示例所做的假设4.多层感知器内部机理5.参数统计6.叠加7.下期预告相关资料结语前言3Blue1Brown视频笔记,仅供自己参考这几个章节主要解析GPT背后的Transformer,本章主要是剖析Tra......
  • 【强化学习】—— Q-learning算法
    Q-Learning算法Q-learning是一种无模型的强化学习算法,用于寻找最优策略以最大化累积奖励。它通过学习一个状态-动作值函数Q(s,......
  • 论文阅读-Learning to Predict Visual Attributes in the Wild
    摘要视觉属性构成了场景中包含信息的大部分。物体可以使用多种属性来描述,这些属性展现了它们的视觉外观(颜色、纹理)、几何特征(形状、大小、姿态)以及其他内在属性(状态、动作)。现有工作大多局限于特定领域内的属性预测研究。在本文中,我们介绍了一个大规模的野外视觉属性预测数据集,......
  • Cinemachine系列——CinemachineSmoothPath&Body Tracked Dolly
    今天来看下我感觉比较有意思的功能,将摄像机像电影一样固定在一条轨道上进行拍摄。通过PackageManager导入Cinemachine插件,在导入CinemachineSample后,我们可以在Assets文件夹下Cinemachine/2.6.17(这个是你下载的cinemachine版本号)/CinemachineExampleScenes/Scenes/TrackedDo......
  • Cinemachine系列——最佳视野(一)&CinemachineClearShot
    这里介绍一下,自动切换到场景中最佳视野的摄像机的第一种方式。首先介绍一下CinemachineClearShot组件,它是自动完成这项工作的核心。通过PackageManager导入Cinemachine插件,在导入CinemachineSample后,我们可以在Assets文件夹下Cinemachine/2.6.17(这个是你下载的cinemachine版本......