标签:String 个人 项目 30 60 test txt orig
个人项目:论文查重
这个作业属于哪个课程 |
课程链接 |
这个作业要求在哪里 |
作业要求链接 |
这个作业的目标 |
完成个人项目并上传至GitHub |
PSP表
PSP2.1 |
Personal Software Process Stages |
预估耗时(分钟) |
实际耗时(分钟) |
Planning |
计划 |
60 |
60 |
· Estimate |
· 估计这个任务需要多少时间 |
60 |
60 |
Development |
开发 |
610 |
730 |
· Analysis |
· 需求分析 (包括学习新技术) |
120 |
120 |
· Design Spec |
· 生成设计文档 |
60 |
60 |
· Design Review |
· 设计复审 |
20 |
20 |
· Coding Standard |
· 代码规范 (为目前的开发制定合适的规范) |
20 |
20 |
· Design |
· 具体设计 |
120 |
180 |
· Coding |
· 具体编码 |
180 |
240 |
· Code Review |
· 代码复审 |
60 |
60 |
· Test |
· 测试(自我测试,修改代码,提交修改) |
30 |
30 |
Reporting |
报告 |
90 |
90 |
· Test Repor |
· 测试报告 |
30 |
30 |
· Size Measurement |
· 计算工作量 |
30 |
30 |
· Postmortem & Process Improvement Plan |
· 事后总结, 并提出过程改进计划 |
30 |
30 |
|
· 合计 |
760 |
880 |
计算模块接口的设计与实现过程
程序流程
项目结构
计算模块接口部分的性能改进
overview
实时内存
- 分析:内存消耗最大的是第三方分词包com.hankcs.hanlp提供的接口,想要优化就必须用其他更好的分词算法或改进这个算法
计算模块部分单元测试展示
测试部分代码
public class test {
@Test
public void origAndAllTest(){
String[] str = new String[6];
str[0] = main.readTxt("D:/test/orig.txt");
str[1] = main.readTxt("D:/test/orig_0.8_add.txt");
str[2] = main.readTxt("D:/test/orig_0.8_del.txt");
str[3] = main.readTxt("D:/test/orig_0.8_dis_1.txt");
str[4] = main.readTxt("D:/test/orig_0.8_dis_10.txt");
str[5] = main.readTxt("D:/test/orig_0.8_dis_15.txt");
String ansFileName = "D:/test/ans.txt";
for(int i = 0; i <= 5; i++){
double ans = main.getSimilarity(main.getSimHash(str[0]), main.getSimHash(str[i]));
main.writeTxt(ans, ansFileName);
}
}
@Test
public void getHammingDistanceTest() {
String str0 = main.readTxt("D:/test/orig.txt");
String str1 = main.readTxt("D:/test/orig_0.8_add.txt");
int distance = main.getHammingDistance(main.getSimHash(str0), main.getSimHash(str1));
System.out.println("海明距离:" + distance);
System.out.println("相似度: " + (100 - distance * 100 / 128) + "%");
}
@Test
public void getHammingDistanceFailTest() {
// 测试str0.length()!=str1.length()的情况
String str0 = "10101010";
String str1 = "1010101";
System.out.println(main.getHammingDistance(str0, str1));
}
@Test
public void getSimilarityTest() {
String str0 = main.readTxt("D:/test/orig.txt");
String str1 = main.readTxt("D:/test/orig_0.8_add.txt");
int distance = main.getHammingDistance(main.getSimHash(str0), main.getSimHash(str1));
double similarity = main.getSimilarity(main.getSimHash(str0), main.getSimHash(str1));
System.out.println("str0和str1的汉明距离: " + distance);
System.out.println("str0和str1的相似度:" + similarity);
}
@Test
public void getHashTest(){
String[] strings = {"今天", "是", "星期天", "天气", "晴", "今天","晚上","我","要","去","看","电影"};
for (String string : strings) {
String stringHash = main.getHash(string);
System.out.println(stringHash.length());
System.out.println(stringHash);
}
}
@Test
public void getSimHashTest(){
String str0 = main.readTxt("D:/test/orig.txt");
String str1 = main.readTxt("D:/test/orig_0.8_add.txt");
System.out.println(main.getSimHash(str0));
System.out.println(main.getSimHash(str1));
}
@Test
public void readTxtTest() {
// 路径存在,正常读取
String str = main.readTxt("D:/test/orig.txt");
String[] strings = str.split(" ");
for (String string : strings) {
System.out.println(string);
}
}
@Test
public void writeTxtTest() {
// 路径存在,正常写入
double[] elem = {0.11, 0.22, 0.33, 0.44, 0.55};
for (int i = 0; i < elem.length; i++) {
main.writeTxt(elem[i], "D:/test/ans1.txt");
}
}
@Test
public void readTxtFailTest() {
// 路径不存在,读取失败
String str = main.readTxt("D:/test/none.txt");
}
@Test
public void writeTxtFailTest() {
// 路径错误,写入失败
double[] elem = {0.11, 0.22, 0.33, 0.44, 0.55};
for (int i = 0; i < elem.length; i++) {
main.writeTxt(elem[i], "User:/test/ans1.txt");
}
}
}
测试结果
运行结果
异常错误处理
当文本为空时
异常类代码
public void ERROR(String s) {
super(s);
}
异常类测试
@Test
public void emptyTest() {
String s = "text/orig.txt";
String t = "text/empty.txt";
String ansPath = "text/ans.txt";
Process.solve(s,t, ansPath);
}
标签:String,
个人,
项目,
30,
60,
test,
txt,
orig
From: https://www.cnblogs.com/youngbye/p/18069345