首页 > 其他分享 >RUL预测常用数据集--C-MAPSS Dataset介绍

RUL预测常用数据集--C-MAPSS Dataset介绍

时间:2023-05-06 23:14:52浏览次数:54  
标签:RUL -- MAPSS df train cond test unit op

C-MAPSS是针对航空发动机剩余寿命预测的数据集。该数据集由NASA(美国国家航空航天局)发布,包含了四个不同类型的航空发动机的传感器数据,以及相应的故障模式和剩余寿命数据,如表1所示。

表1 Information of the C-MAPSS dataset.
Dataset FD001 FD002 FD003 FD004
Engine units for training 100 260 100 249
Engine units for testing 100 259 100 248
Operating conditions 1 6 1 6
Fault modes 1 1 2 2
Training samples (default) 17731 48819 21820 57522
Testing samples 100 259 100 248

FD001~FD004这4个数据子集包含的参数类型完全相同,原始文件为txt类型,每列参数名称如表2所示。这些传感器数据包括了发动机的操作参数、振动和温度等信息供24个传感器信息,如表3所示,可以用于训练和测试故障诊断和剩余寿命预测模型。该数据集被广泛应用于机器学习和数据挖掘领域,为航空发动机的健康管理提供了有价值的数据支持。

表2 各列参数名称
列数 1 2 3~5 6~26
参数名称 发动机引擎id 引擎的当前循环数 操作条件 传感器数据
表3 21列传感器名称、意义、符号

image

数据预处理代码(语言为python)

代码来源于《Variational encoding approach for interpretable assessment of remaining useful life estimation》作者的公开代码,笔者有更改,不保证绝对正确,请谨慎使用。
github: https://github.com/NahuelCostaCortez/RemainingUseful-Life-Estimation-Variational

import numpy as np
from sklearn.model_selection import GroupShuffleSplit
from sklearn.preprocessing import StandardScaler
import pandas as pd

def add_remaining_useful_life(df):
    # Get the total number of cycles for each unit
    grouped_by_unit = df.groupby(by="unit_nr")
    max_cycle = grouped_by_unit["time_cycles"].max()
    
    # Merge the max cycle back into the original frame
    result_frame = df.merge(max_cycle.to_frame(name='max_cycle'), left_on='unit_nr', right_index=True)
    
    # Calculate remaining useful life for each row
    remaining_useful_life = result_frame["max_cycle"] - result_frame["time_cycles"]
    result_frame["RUL"] = remaining_useful_life
    
    # drop max_cycle as it's no longer needed
    result_frame = result_frame.drop("max_cycle", axis=1)
    return result_frame

def add_operating_condition(df):
    df_op_cond = df.copy()
    
    df_op_cond['setting_1'] = abs(df_op_cond['setting_1'].round())
    df_op_cond['setting_2'] = abs(df_op_cond['setting_2'].round(decimals=2))
    
    # converting settings to string and concatanating makes the operating condition into a categorical variable
    df_op_cond['op_cond'] = df_op_cond['setting_1'].astype(str) + '_' + \
                        df_op_cond['setting_2'].astype(str) + '_' + \
                        df_op_cond['setting_3'].astype(str)
    
    return df_op_cond

def condition_scaler(df_train, df_test, sensor_names):
    # apply operating condition specific scaling
    scaler = StandardScaler()
    for condition in df_train['op_cond'].unique():
        scaler.fit(df_train.loc[df_train['op_cond']==condition, sensor_names])
        df_train.loc[df_train['op_cond']==condition, sensor_names] = scaler.transform(df_train.loc[df_train['op_cond']==condition, sensor_names])
        df_test.loc[df_test['op_cond']==condition, sensor_names] = scaler.transform(df_test.loc[df_test['op_cond']==condition, sensor_names])
    return df_train, df_test

def exponential_smoothing(df, sensors, n_samples, alpha=0.4):
    df = df.copy()
    # first, take the exponential weighted mean
    df[sensors] = df.groupby('unit_nr')[sensors].apply(lambda x: x.ewm(alpha=alpha).mean()).reset_index(level=0, drop=True)
    
    # second, drop first n_samples of each unit_nr to reduce filter delay
    def create_mask(data, samples):
        result = np.ones_like(data)
        result[0:samples] = 0
        return result
    
    mask = df.groupby('unit_nr')['unit_nr'].transform(create_mask, samples=n_samples).astype(bool)
    df = df[mask]
    
    return df

def gen_train_data(df, sequence_length, columns):
    data = df[columns].values
    num_elements = data.shape[0]

    # -1 and +1 because of Python indexing
    for start, stop in zip(range(0, num_elements-(sequence_length-1)), range(sequence_length, num_elements+1)):
        yield data[start:stop, :]
        
def gen_data_wrapper(df, sequence_length, columns, unit_nrs=np.array([])):
    if unit_nrs.size <= 0:
        unit_nrs = df['unit_nr'].unique()
        
    data_gen = (list(gen_train_data(df[df['unit_nr']==unit_nr], sequence_length, columns))
               for unit_nr in unit_nrs)
    data_array = np.concatenate(list(data_gen)).astype(np.float32)
    return data_array

def gen_labels(df, sequence_length, label):
    data_matrix = df[label].values
    num_elements = data_matrix.shape[0]

    # -1 because I want to predict the rul of that last row in the sequence, not the next row
    return data_matrix[sequence_length-1:num_elements, :]  

def gen_label_wrapper(df, sequence_length, label, unit_nrs=np.array([])):
    if unit_nrs.size <= 0:
        unit_nrs = df['unit_nr'].unique()
        
    label_gen = [gen_labels(df[df['unit_nr']==unit_nr], sequence_length, label) 
                for unit_nr in unit_nrs]
    label_array = np.concatenate(label_gen).astype(np.float32)
    return label_array

def gen_test_data(df, sequence_length, columns, mask_value):
    if df.shape[0] < sequence_length:
        data_matrix = np.full(shape=(sequence_length, len(columns)), fill_value=mask_value) # pad
        idx = data_matrix.shape[0] - df.shape[0]
        data_matrix[idx:,:] = df[columns].values  # fill with available data
    else:
        data_matrix = df[columns].values
        
    # specifically yield the last possible sequence
    stop = data_matrix.shape[0]
    start = stop - sequence_length
    for i in list(range(1)):
        yield data_matrix[start:stop, :]  
        
	
def get_data(dataset, sensors, sequence_length, alpha, threshold):
	# files
	dir_path = './data/'
	train_file = 'train_'+dataset+'.txt'
	test_file = 'test_'+dataset+'.txt'
    # columns
	index_names = ['unit_nr', 'time_cycles']
	setting_names = ['setting_1', 'setting_2', 'setting_3']
	sensor_names = ['s_{}'.format(i+1) for i in range(0,21)]
	col_names = index_names + setting_names + sensor_names
    # data readout
	train = pd.read_csv((dir_path+train_file), sep=r'\s+', header=None, 
					 names=col_names)
	test = pd.read_csv((dir_path+test_file), sep=r'\s+', header=None, 
					 names=col_names)
	y_test = pd.read_csv((dir_path+'RUL_'+dataset+'.txt'), sep=r'\s+', header=None, 
					 names=['RemainingUsefulLife'])

    # create RUL values according to the piece-wise target function
	train = add_remaining_useful_life(train)
	train['RUL'].clip(upper=threshold, inplace=True)
	y_test['RemainingUsefulLife'].clip(upper=threshold, inplace=True)
    # remove unused sensors
	drop_sensors = [element for element in sensor_names if element not in sensors]

    # scale with respect to the operating condition
	X_train_pre = add_operating_condition(train.drop(drop_sensors, axis=1))
	X_test_pre = add_operating_condition(test.drop(drop_sensors, axis=1))
	X_train_pre, X_test_pre = condition_scaler(X_train_pre, X_test_pre, sensors)

    # exponential smoothing
	X_train_pre= exponential_smoothing(X_train_pre, sensors, 0, alpha)
	X_test_pre = exponential_smoothing(X_test_pre, sensors, 0, alpha)

	# train-val split
	gss = GroupShuffleSplit(n_splits=1, train_size=0.80, random_state=42)
	# generate the train/val for *each* sample -> for that we iterate over the train and val units we want
	# this is a for that iterates only once and in that iterations at the same time iterates over all the values we want,
	# i.e. train_unit and val_unit are not a single value but a set of training/vali units
	for train_unit, val_unit in gss.split(X_train_pre['unit_nr'].unique(), groups=X_train_pre['unit_nr'].unique()): 
		train_unit = X_train_pre['unit_nr'].unique()[train_unit]  # gss returns indexes and index starts at 1
		val_unit = X_train_pre['unit_nr'].unique()[val_unit]

		x_train = gen_data_wrapper(X_train_pre, sequence_length, sensors, train_unit)
		y_train = gen_label_wrapper(X_train_pre, sequence_length, ['RUL'], train_unit)
		
		x_val = gen_data_wrapper(X_train_pre, sequence_length, sensors, val_unit)
		y_val = gen_label_wrapper(X_train_pre, sequence_length, ['RUL'], val_unit)

	# create sequences for test 
	test_gen = (list(gen_test_data(X_test_pre[X_test_pre['unit_nr']==unit_nr], sequence_length, sensors, -99.))
			   for unit_nr in X_test_pre['unit_nr'].unique())
	x_test = np.concatenate(list(test_gen)).astype(np.float32)
	
	return x_train, y_train, x_val, y_val, x_test, y_test['RemainingUsefulLif

代码来源于《Variational encoding approach for interpretable assessment of remaining useful life estimation》作者的公开代码,笔者有更改,不保证绝对正确,请谨慎使用,谢谢。
github: https://github.com/NahuelCostaCortez/RemainingUseful-Life-Estimation-Variational

标签:RUL,--,MAPSS,df,train,cond,test,unit,op
From: https://www.cnblogs.com/huxiaohu52/p/17378657.html

相关文章

  • NC20545 [HEOI2012]采花
    题目链接题目题目描述萧芸斓是Z国的公主,平时的一大爱好是采花。今天天气晴朗,阳光明媚,公主清晨便去了皇宫中新建的花园采花。花园足够大,容纳了\(n\)朵花,花有\(c\)种颜色(用整数\(1-c\)表示),且花是排成一排的,以便于公主采花。公主每次采花后会统计采到的花的颜色数,颜色数......
  • 数字证书编码ASN.1 _
    一、任务详情参考附件中图书p223中13.2的实验指导,完成DER编码序列号=1174(0x0496),证书签发者DN="CN=VirtualCA,C=CN",证书持有者DN="CN=你的名字拼音,OU=Person,C=CN",证书有效期=20200222000000-20220222000000。用echo-n-e"编码">你的学号.der中,用OpenSSLasn1parse分析......
  • 初识常见关键字
     typedef:类型定义(类型重定义)简单来说就是重命名就是如何把常语句简单化,只需在函数前给上一个typedefunsignedintu_int;就可以直接把这个类型重新起了个新名字u_int,这个简写就和原本的是一样的意思 static:静态的(有3种用法)1.static修饰局部变量,改变了局部变量的生命周期(......
  • acwing 4645. 选数异或
     输出yesnoyes no题意分析,给一串数组,再在每次提问时给出一个区间,l,r;求l,r区间内是否存在两个数,两数异或后值为给出的x;已知a^b=x-->a^x=b;思路:1,把每个数异或x,存在另一个数组(b)里,暴力搜索,看区间内b数组内数字是否有等于a数组内数字,TLE2.记录下标,比较每个......
  • 《软件需求模式》观后感-1
    书中简单的将需求定义为:需求就是定义系统需要做什么而不是怎么做。需求也是有一些原则的,1)定义问题而不是解决方案,2)定义系统而不是项目,3)区分正式和非正式部分,4)避免重复,在几种需求流程中,我们了解到每种需求流程都有自身的优点和缺点,传统需求流程比较规规矩矩,这样可以使项目需求......
  • python3 xml tree
    Python3XML解析|菜鸟教程(runoob.com)Python标准库之xml.etree-Awakenedy-博客园(cnblogs.com)1、介绍通过python3自带的xml.etree.ElementTree模块可以实现对xml的操作。XML是一种固有的分层数据格式,也是用一棵树来表示它。为此,本模块分为两个类:ElementTree将......
  • Linux各目录作用
    打开Ubuntu20.04系统,根目录如下:1、binbin为binary的简写,主要放置系统的必备执行文件,各种命令的实现在这个目录中。2、sbin主要放置系统管理的必备程序,root用户的命令在这个目录中。3、devdev为device的缩写,储存硬件设备信息。4、lib和lib64lib为library的缩写,lib存放着......
  • JavaScript实训
    程序结构分支结构if分支任务1设计程序界面如下图所示,在文本框输入整数,使用if分支,先判断它是否是数字,如果是,再判断它的奇偶性,结果在弹出窗口(alert)中显示。提示:isNaN(<字符串>)用来判断<字符串>是否不是数字,如果不是数字,该函数返回true,否则返回false。点击查看代码<!D......
  • redis的持久化
    ################################ save[work@a8-cloud-dba-db08~]$redis-cli-aroot-p6381127.0.0.1:6381>saveOKsave命令对应的日志:[work@a8-cloud-dba-db08log]$tail-fredis.log4467:M06May202323:02:02.803*DBsavedondisk bgsave[wor......
  • HashMap设置初始容量一直都用错了?
    1背景今天在代码审查的时候,发现一位离职的同事留下了这样一串代码:Map<String,String>map=newHashMap<>((int)(list.size()/0.75F+1));第一反应是:又在炫技,又在搞这些花里胡哨的东西。但是看到0.75的我却陷入了沉思,稍微深入了解过Map的应该都知道,Map中有个属性,叫做负载因......