首页 > 其他分享 >MLE 5217 : Take-Home Dataset Classification

MLE 5217 : Take-Home Dataset Classification

时间:2024-10-20 19:11:04浏览次数:10  
标签:non Classification MLE may metals Dataset model data your

Dept. of Materials Science & Engineering NUS

MLE 5217 : Take-Home Assignments Lecturer Sasani Jayawardhana

Objectives

Based on the chemical composition of materials build a classification model to distinguish metals and non-metalsModel 1), and then build a regression model to predict the bandgap of non-metallic compounds (Model 2).

Please use a separate jupyter notebook for each of the models.

Data The data contains the chemical formula and energy band gaps (in eV) of experimentally measured compounds.These measurements have been obtained using a number of techniques such as diffuse reflectance, resistivitymeasurements, surface photovoltage, photoconduction, and UV-vis measurements. Therefore a given compoundmay have more than one measurement value.

Tasks

Model I (30 marks)

Dataset: Classification data.csv

Fit a Support Vector Classification model to separate metals from non-metals in the data. Ensure that you:

  • Follow the usual machine learning process.
  • Use a suitable composition based feature vector to vectorize the chemical compounds.
  • You may use your judgement on how to differentiate between metals & non-metals. As a guide, two possibleoptions are given below.Option 1 : for metals Eg = 0, and Non-metals Eg > 0Option 2: for metals Eg 0.5, for non-metals Eg > 0.5
  • Use suitable metrics to quantify the performance of the classifier.
  • For added advantage you may optimize the hyper-parameters of the Support Vector Classifier. Note: Optimization algorithms can require high processing power, therefore may cause your computer to freeze (Ensureyou have saved all your work before you run such codes). In such a case you may either do a manualoptimization or leave the code without execution.
  • Comment on the overall performance of the model.

Model II (30 marks)

Dataset: Regression data.csv

Fit a Regression Equation to the non-metals to predict the bandgap energies based on their chemical composition

  • Use a suitable composition based feature vector to vectorize the chemical compounds. You may try multiplefeature vectors and analyse the outcomes.
  • You may experiment with different models for 代 写MLE 5217 : Take-Home Dataset Classification regression analysis if required.
  • Comment on the overall performance of the model and suggest any short-comings or potential improvements.September 2024Important : Comments
  • Write clear comments in the code so that a user can follow the logic.
  • In instances where you have made decisions, justify them.
  • In instances where you may have decided to follow a different analysis path (than what is outlined in thetasks), explain your thinking in the comments.
  • Acknowledge (if any) references used at the bottom of the notebook.

Submission

  • Ensure that each of the cells of code in the final Jupyter notebooks have been Run for output (Except forthe hyper-parameter optimization if any).
  • The two models (I and II) have been entered in two separate notebooks.
  • Name the files by your name as ”YourName 1.ipynb” and ”YourName 2.ipynb”
  • It is your responsibility to Ensure that the correct files are being submitted, and the file extensionsare in the correct format (.ipynb).
  • Submission will be via Canvas, and late submissions will be penalized.

Evaluation

The primary emphasis will be on the depth and thoroughness of your approach to the problem. Key areas of focuswill include:

* Data Exploration: Demonstrating a thorough investigation of the data, exploring different analyticalpossibilities, and thoughtfully selecting the best course of action.

* Implementation: Translating your chosen approach into clean and efficient code.

* Machine Learning Process: Executing the machine learning process correctly and methodically, ensuringproper data handling, model selection, and evaluation.

* Clarity of Explanation: Providing clear explanations of each step, with logical reasoning for the decisions made.

*Critical Analysis: Identifying any limitations of the approach, suggesting potential improvements, and makingrelevant statistical inferences based on the results.

================================================================

标签:non,Classification,MLE,may,metals,Dataset,model,data,your
From: https://www.cnblogs.com/goodlunn/p/18486999

相关文章

  • CS209A Analysis of the Olympic Historical Dataset
    [CS209A-24Fall]Assignment1(100points)Thissummer,we'veenjoyedtheOlympicGamesParis2024.ManyofusarestillrelivingtheexcitingmomentsofthesummerOlympics,andmanyofusmaybeinterestedintheeventofpastOlympicsandthepastpe......
  • 《Pytorch深度学习实践》P8 Dataset and DataLoader 笔记+代码+作业:DataLoader的使用
     b站的up主刘二大人的《PyTorch深度学习实践》P8笔记+代码,视频链接。所需糖尿病数据可以在视频评论区下方的网盘资源里下载(转载评论区):链接:https://pan.baidu.com/s/1cUI518pgLWY1oCn2DkjuEQ?pwd=kif1 提取码:kif1或者是点击链接下载:【免费】b站的up主刘二大人的《PyTorc......
  • 3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views
    3DRealCar:AnIn-the-wildRGB-DCarDatasetwith360-degreeViewsDu,XiaobiaoandSun,HaiyangandWang,ShuyunandWu,ZhuojieandSheng,HongweiandYing 来自很多单位,其中企业所在单位是LiAuto项目地址:https://xiaobiaodu.github.io/3drealcar/ gitcode: h......
  • PyTorchStepByStep - Chapter 3: A Simple Classification Problem
     X,y=make_moons(n_samples=100,noise=.3,random_state=0)X_train,X_val,y_train,y_val=train_test_split(X,y,test_size=.2,random_state=13) sc=StandardScaler()sc.fit(X_train)X_train=sc.transform(X_train)X_val=sc.transform(X_val......
  • 利用pytorch的datasets在本地读取MNIST数据集进行分类
    MNIST数据集下载地址:tensorflow-tutorial-samples/mnist/data_setatmaster·geektutu/tensorflow-tutorial-samples·GitHub数据集存放和dataset的参数设置:完整的MNIST分类代码:importtorchimporttorch.nnasnnimporttorch.optimasoptimfromtorchvisionimpor......
  • Open X-Embodiment: Robotic Learning Datasets and RT-X Models
    OpenX-Embodiment:RoboticLearningDatasetsandRT-XModels启发:在不同数据集上训练大规模、高容量模型以处理下游应用方面取得显著成功。是否能将所有数据整合在高容量机器人操作模型上使其有效适应新的机器人、任务、环境?贡献:提供了标准化数据格式和模型的数据集,收......
  • kedro IncrementalDataset 简单说明
    IncrementalDataset实现了一种增量数据处理的能力,基于了PartitionedDataset同时包含了checkpoint确保数据处理的准确性,对于checkpoint可以配置自己的函数参考定义参考catalog定义my_partitioned_dataset:type:partitions.IncrementalDatasetpath:......
  • [vue] vue-seamless-scroll 滚动到第二遍的时候不能进行点击的问题
    问题:使用vue-seamless-scroll组件时,循环第一遍可以正常点击,之后不能够正常点击,触发不了点击事件.解决办法:在vue-seamless-scroll外的父元素上添加点击事件,利用js的事件委托(通俗地来讲,就是把一个元素响应事件(click、keydown…)的函数委托到另一个元素)使用(data-XXX)自定义属性可以给......
  • Spark(十)SparkSQL DataSet
    DataSetDataSet是具有强类型的数据集合,需要提供对应的类型信息1.创建DataSet使用样例类序列创建DataSetscala>caseclassperson(id:Int,name:String,age:Int)definedclasspersonscala>valcaseClassDS=Seq(person(1,"zhangsan",23)).toDS()caseClassDS:org.apa......
  • [CVPR2024]DeiT-LT Distillation Strikes Back for Vision Transformer Training on L
    在长尾数据集上,本文引入强增强(文中也称为OOD)实现对DeiT的知识蒸馏的改进,实现尾部类分类性能的提升。动机ViT相较于CNN缺少归纳偏置,如局部性(一个像素与周围的区域关系更紧密)、平移不变性(图像的主体在图像的任意位置都应该一样重要)。因此需要大型数据集进行预训练。长尾数据学习......