首页 > 其他分享 >CS-UY 4563 - Introduction to Machine Learning

CS-UY 4563 - Introduction to Machine Learning

时间:2024-12-10 19:26:08浏览次数:8  
标签:project use UY Introduction class Machine will data your

Final ProjectCS-UY 4563 - Introduction to Machine Learning

Overview

  • Partner with one student and select a machine learning problem of your choice.
  • Apply the machine learning techniques you’ve learned during the course toyour chosen problem.
  • Present your project to the class at the semester’s end.

Submission Requirements on Gradescope Submit the following on Gradescope by the evening before the first presentation (exactdate to be announced):

  • Presentation slides.
  • Project write-up (PDF format).
  • Project code as a Jupyter Notebook. If necessary, a GitHub link is acceptable.
  • If using a custom dataset, upload it to Gradescope (or provide a GitHub link, ifnecessary).1Project Guidelines

Write-Up Requirements Your project write-up should include the following:

  1. Introduction: Describe your data set and the problem you aim to solve.
  2. Perform some unsupervised analysis:
  • Explore pattern or structure in the data using clustering and dimensionality (e.gPCA).
  • Visualize the training data1 :Plot individual features to understand their distribution (e.g., histogramsor density plots).Plot individual features and their relationship with the target variable.Create a correlation matrix to analyze relationships between features.
  • Discuss any interesting structure is present in the data. If you don’t find anyinteresting structure, describe what you tried.
  1. Supervised analysis: Train at least three distinct learning models2 discussed inthe class (such as Linear Regression, Logistic Regression, SVM, Neural NetworksCNN).3For implementation, you may:
  • Use your own implementation from homework or developed independently.
  • Use libraries such as Keras, scikit-learn, or TensorFlow.

For each model,4 you must:

  • Try different feature transformations. You should have at least three transfor

mations. For example, try the polynomial, PCA, or radial-basis function kernel.

For neural networks, different architectures (e.g., neural networks with varying

numbers of layers) can also be considered forms of feature transformations, as

they learn complex representations of the input data.

  • Use different regularization techniques. You should have at least 6 differentregularization values per model1Do not look at the validation or test data.2You can turn a regression task into a classification task by binning, or for the same dataset,select adifferent feature as the target for your model. Or you can use SVR.3Ifyou wish to use a model not discussed in class, you must discuss it with me first, or you will notreceive any points for that model.4Even if you get a very high accuracy,perform these transformations to see what happens.2 Table of Results:
  • Provide a table with training accuracy and validation metrics for every model.

Include results for the different parameter settings (e.g., different regularization

values).

For classification include metrics such as precision/recall.

For regression modes, report metrics like MSE, R2 . For example, supposeyou’re using Ridge Regression and manipulating the value of λ. In thatcase, your table shouldcontain the training and validation accuracy forevery lambda value you used.

  • Plot and analyze how performance metrics (like accuracy, precision, recall, MSE)change with different feature transformations, hyperparameters (e.g.regularizationsettings, learning rate).

Analytical Discussion:

  • Analyze the experimental results and explain key findings. Provide a chart ofyour key findings.
  • Highlight the impact of feature transformations, regularization, and other hyperparameters on the model’s performance. Refer to the graphs provide in earliersections to support your analysis. 代写CS-UY 4563 - Introduction to Machine Learning  Focus on interpreting:Whether the models overfit or underfit the data.How bias and variance affect performance, and which parameter choiceshelped achieve better generalization.

Presentation Guidelines

  • You and your partner will give a six-minute presentation to the class.
  • Presentations will be held during the last 2 or 3 class periods and during the finalexam period for this class. You will be assigned a day for your presentation. If werun out of time the day you are to present your project, you will present the nextday reserved for presentations.
  • Attendance during all presentations is required. A part of your project gradewill be based on your attendance for everyone else’s presentation.

Important Notes on Academic Integrity

  • Your submission will undergo plagiarism checks.
  • If we suspect you of cheating, you will receive 0 for your final project grade. See thesyllabus for additional penalties that may be applied.

3Dataset Resources Below are some resources where you can search for datasets. As a rough guideline, yourdataset should have at least 200 training examples and at least 10 features. Youare free to use these resources, look elsewhere, or create your own dataset.

Modifications

  • If you have a project idea that doesn’t satisfy all the requirements mentioned above,please inform me, and we can discuss its viability as your final project.
  • If you use techniques not covered in class, you must demonstrate your understandingof these ideas.Brightspace Submissions Guidelines
  • Dataset and Partner: Submit the link to your chosen dataset and your partner’sname by October 30th.
  • Final Submissions: Upload your presentation slides, project write-up, and code toGradescope by the evening before the first scheduled presentation. The exact datePotential Challenges and Resources As you work with your dataset, you may encounter specific challenges that require aditional techniques or tools. Below are some topics and resources that might be useful.lease explore these topics further through online research.

4Feature Reduction: Consider using PCA (which will be covered in class). PCA isespecially useful when working with SVMs, as they can be slow with highdimensionaldata.If you choose to use SelectKBest from scikit-learn, you must understand why it worksbefore you use it.

  • Creating Synthetic Examples: When using SMOTE or other methods to generatesynthetic data, ensure that only real data is used in the validation and test sets.- If using synthetic data, make sure your validation set and test set mirrors the trueclasproportions from the original dataset. A balanced test set for naturally unbalanced data can give misleading impressions of your model’s real-world performanceFor more details, see: Handling Imbalanced Classes
  • Working with Time Series Data: For insights on working with time series data,visit: NIST Handbook on Time SeriesHandling Missing Feature Values:See Lecture 16 at Stanford STATS 306BTechniques to Handle Missing Data ValuesHow to Handle Missing Data in PythonStatistical Imputation for Missing Data
  • Multiclass Classification:Understanding Softmax in Multiclass ClassificatioPrecision and Recall for Multiclass Metrics
  • Optimizers for Neural Networks: You may use Adam or other optimizers fortraining neural networks.
  • Centering Image Data with Bounding Boxes: If you are working with imagedata, you are allowed to use bounding boxes to center the objects in your images. Youcan use libraries like OpenCV (‘cv2’). Don’t forget to scale your data as part of preprocessing. Be sure to document any modifi-cations you made, including the scaling or normalization techniques you applied.The following resource might be helpful. Please stick to topics we discussed in class or==those mentioned above: CS229: Practical Machine Learning Advic5

标签:project,use,UY,Introduction,class,Machine,will,data,your
From: https://www.cnblogs.com/CSE2425/p/18596909

相关文章

  • 从代码解析Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generate
    本文是对一篇ICML2024文章SpottingLLMsWithBinoculars:Zero-ShotDetectionofMachine-GeneratedText进行计算过程的讲解该文章主要提供了一种zero-shot的AIGC文本检测方法,在文章中所说,使用较少的计算量就起到了不错的效果主要计算过程如下图所示:perplexityperp......
  • CCIT4020 Introduction to Computer
     CCIT4020IntroductiontoComputerProgrammingAssignment3–SectionCGeneralguidelines:Useconciseanddirecttechniques/programcodeswelearninourcourse.Uselessorover-complicatedtechniques/programcodesmaybeignoredorpenalized.Stud......
  • COMP42215 Introduction to Computer Science
    INTRODUCTIONTOCOMPUTERSCIENCE2024/2025MastersProgrammesCourseworkAdministrativeDetailsModule/LectureCourse:COMP42215IntroductiontoComputerScienceeadlineforsubmission:14:00Friday13thDecember2024Workreturned:WeekBeginning13th......
  • Python爬虫——批量爬取douyin视频,下载到本地
    概要针对批量爬取douyin视频分为两期进行讲解,本期(第一期)内容是讲解如何在上批量下载视频,如何快速的搭建环境,修改参数,让小伙伴们边看边学,半个小时内就可以轻松将douyin视频批量进行下载。第二期内容主要是对代码进行详解,对爬虫感兴趣的小伙伴可以深入了解一下。   ......
  • [论文阅读] Breaking the Representation Bottleneck of Chinese Characters{colon}Ne
    Pretitle:BreakingtheRepresentationBottleneckofChineseCharacters:NeuralMachineTranslationwithStrokeSequenceModelingaccepted:EMNLP2022paper:https://arxiv.org/abs/2211.12781code:https://github.com/zjwang21/StrokeNet关键词:NMT(neuralmachin......
  • 【机器学习】机器学习的基本分类-监督学习-支持向量机(Support Vector Machine, SVM)
    支持向量机是一种强大的监督学习算法,主要用于分类问题,但也可以用于回归和异常检测。SVM的核心思想是通过最大化分类边界的方式找到数据的最佳分离超平面。1.核心思想目标给定训练数据,其中是特征向量,是标签,SVM的目标是找到一个超平面将数据分开,同时最大化分类边界的......
  • 题解:P11217 【MX-S4-T1】「yyOI R2」youyou 的垃圾桶
    链接https://www.luogu.com.cn/problem/P11217分析先不考虑维护垃圾桶的攻击力,假设我们已经知道了所有垃圾桶的攻击力。翻倍操作可以用左移(<<)实现。首先先计算出所有垃圾桶的伤害值,然后看看能抗几个整轮。然后考虑不能抗的情况。由于所有垃圾桶的攻击力都为正数,所以可以二......
  • 【Unity 插件】Visual State Machine 通过图形化的界面帮助开发者设计和管理复杂的状
    VisualStateMachine是一款用于Unity编辑器中的插件,旨在通过图形化的界面帮助开发者设计和管理复杂的状态机逻辑。它为Unity提供了一个直观的拖拽式状态机系统,可以用来控制角色行为、AI、动画、UI交互等各种状态转换。主要特点:图形化界面:使用拖拽式界面来创建和管理......
  • MPHY0041 Machine Learning in Medical Imaging
    AssessedCourseworkTrackingSheetModuleCode:MPHY0041ModuleTitle:MachineLearninginMedicalImagingateHandedout:Friday,October25th2024StudentID(NotName)SubmissionInstruction:Beforethesubmissiondeadline,youshoulddigitallysubmi......
  • P11218 【MX-S4-T2】「yyOI R2」youyou 不喜欢夏天
    ProblemSolve先不看yy,我们能够发现这个youyou可以贪心,即:某一列全是1,全选,有一个1,尽量只选1(因为可能和上一列的选择连不起来,要衔接),全0,尽量不要选再回来看yy,通过题意以及样例等数据来看,我们能够发现这个yy肯定只会对满足这样的列进行操作:上下两行只选了一行1,另一行是0通过......