我有一个 Matlab 脚本,可以读取编码的 .dat 文件,对其进行解码并保存。我试图使用 numpy 将其转换为 Python。我发现对于同一个文件,我得到不同的输出结果(python 数字没有意义)。该代码最初作为从串行端口读取的脚本的一部分运行,因此是数据的结构。
我首先认为位移是问题所在,因为索引差异,并且由于 Numpy 将右移和左移分成不同的函数,但情况似乎并非如此。我还尝试将位移位和点积运算分开,以确认它们是以正确的顺序完成的,但这也不是问题。然后我将 python
.read()
替换为
np.fromfile()
,认为在读取时不指定正确的
uint8
格式可能会损坏数据。这时我在调试时注意到
packet_data
包含与我在 Matlab 中得到的值不同的值。我假设重塑数据会打乱它,但实际上矩阵完全包含不同的值。当然,这也意味着
packet_data_bytes
是完全错误的。我不知道为什么,但同一个文件在 Python 中读取时会给出不同的值,而不是在 Matlab 中读取。我不确定读取函数有什么区别,或者是否与我在脚本中打开文件的方式有关。
这是 matlab 和 python 中的代码。
Matlab:|| |Python:
'matlab'
% Define input and output file paths
inputFilePath = 'C:/Users/x/Downloads/Raw.dat'
outputFilePath = 'C:/Users/x/Downloads/decoded.dat'
% Open the input file
fileID = fopen(inputFilePath, 'r');
% Open the output file
outputFileID = fopen(outputFilePath, 'w');
% Check if files are successfully opened
if fileID == -1
error('Cannot open the input file.');
end
if outputFileID == -1
error('Cannot open the output file.');
end
% Constants
packetNumberBytesLength = 4;
packetDataBytesRows = 12;
packetDataBytesCols = 1024;
packetDataMasks = [1,2,4,8,16,32];
numSamples = 6;
numChannels = 1024;
% Read and process each packet
while ~feof(fileID)
% Read packet number bytes
packetNumberBytes = fread(fileID, packetNumberBytesLength, 'uint8');
if numel(packetNumberBytes) < packetNumberBytesLength
break;
end
% Read packet data bytes
packetDataBytes = fread(fileID, [packetDataBytesRows, packetDataBytesCols], 'uint8');
if numel(packetDataBytes) < packetDataBytesRows * packetDataBytesCols
break;
end
% Decode packet number
packetNumber = [1677216, 65536, 256, 1] * packetNumberBytes;
% Decoding packet data
Samples = zeros(numSamples, numChannels);
for n = 1:numSamples
Samples(n, :) = [2048,1024,512,256,128,64,32,16,8,4,2,1] * bitshift(bitand(packetDataBytes, packetDataMasks(n)), 1-n);
Samples(n, numChannels) = 0; % Invalid sample in case of Single Cell scan mode
end
% Get current date and time
currentDateTime = datestr(now, 'dd-mmm-yyyy HH:MM:SS.FFF');
% Write decoded data to output file
writematrix(currentDateTime, outputFilePath, 'Delimiter', 'tab', 'WriteMode', 'append');
writematrix([packetNumber, 1], outputFilePath, 'Delimiter', 'tab', 'WriteMode', 'append');
writematrix(Samples(1, :), outputFilePath, 'Delimiter', 'tab', 'WriteMode', 'append');
writematrix([3], outputFilePath, 'Delimiter', 'tab', 'WriteMode', 'append');
writematrix(Samples(3, :), outputFilePath, 'Delimiter', 'tab', 'WriteMode', 'append');
writematrix([5], outputFilePath, 'Delimiter', 'tab', 'WriteMode', 'append');
writematrix(Samples(5, :), outputFilePath, 'Delimiter', 'tab', 'WriteMode', 'append');
end
% Close the input and output files
fclose(fileID);
fclose(outputFileID);
Python:
'python'
import numpy as np
import datetime
# Define input and output file paths
input_file_path = 'C:/Users/x/Downloads/Raw.dat'
output_file_path = 'C:/Users/x/Downloads/decoded.dat'
# Constants
packet_number_bytes_length = 4
packet_data_bytes_rows = 12
packet_data_bytes_cols = 1024
num_samples = 6
num_channels = 1024
packet_data_masks = [1, 2, 4, 8, 16, 32]
# Open the input and output files
try:
with open(input_file_path, 'rb') as input_file, open(output_file_path, 'w') as output_file:
# Read and process each packet
while True:
# Read packet number bytes
packet_number_bytes = np.fromfile(input_file, dtype=np.uint8, count=packet_number_bytes_length)
if len(packet_number_bytes) < packet_number_bytes_length:
break
# Read packet data bytes
packet_data_bytes = np.fromfile(input_file, dtype=np.uint8,
count=packet_data_bytes_rows * packet_data_bytes_cols)
if len(packet_data_bytes) < packet_data_bytes_rows * packet_data_bytes_cols:
break
# Reshape the packet data into a 2D array
packet_data = packet_data_bytes.reshape((packet_data_bytes_rows, packet_data_bytes_cols))
# Decode packet number
packet_number = np.dot([1677216, 65536, 256, 1], np.frombuffer(packet_number_bytes, dtype=np.uint8))
# Decode packet data
Samples = np.zeros((num_samples, num_channels), dtype=int)
for n in range(num_samples):
Samples[n, :] = np.dot(
[2048, 1024, 512, 256, 128, 64, 32, 16, 8, 4, 2, 1],
np.right_shift(np.bitwise_and(packet_data, packet_data_masks[n]), n)
)
Samples[:, num_channels - 1] = 0 # Invalid sample in case of Single Cell scan mode
# Get current date and time
current_date_time = datetime.datetime.now().strftime('%d-%b-%Y %H:%M:%S.%f')[:-3]
# Write decoded data to output file
output_file.write(current_date_time + '\n')
output_file.write(f"{packet_number}\t1\n")
np.savetxt(output_file, Samples[0, :].reshape(1, -1), delimiter='\t', fmt='%d')
output_file.write('3\n')
np.savetxt(output_file, Samples[2, :].reshape(1, -1), delimiter='\t', fmt='%d')
output_file.write('5\n')
np.savetxt(output_file, Samples[4, :].reshape(1, -1), delimiter='\t', fmt='%d')
except FileNotFoundError as e:
print(f"Error: {e}")
在读取和解码
.dat
文件时遇到的问题很可能是由于 Python 和 MATLAB 处理字节顺序(Endianness)的差异造成的。
问题所在:
-
字节顺序:
MATLAB 默认使用大端字节序(Big-Endian),而 Python 默认使用小端字节序(Little-Endian)。这意味着在读取多字节数字(如的
packetNumberBytes
)时,MATLAB 和 Python 会以相反的顺序解释字节。 -
位移操作:
的代码在解码
packet_data
时使用了位移操作。尽管逻辑上是正确的,但在处理不同字节序时,位移操作的结果可能会有所不同。
解决方案:
-
指定字节顺序:
在 Python 代码中读取
packetNumberBytes
时,明确指定大端字节序:
python
packet_number_bytes = np.fromfile(input_file, dtype='>u4', count=1)
这里
'>u4'
表示读取一个无符号整数 (uint32),并使用大端字节序 (
>
)。
-
简化解码过程:
可以使用 NumPy 的
unpackbits
函数来简化解码过程,避免手动进行位移操作。
```python packet_data_bits = np.unpackbits(packet_data).reshape(packet_data_bytes_rows, -1)
Samples = np.zeros((num_samples, num_channels), dtype=int) for n in range(num_samples): Samples[n, :] = np.dot( [2048, 1024, 512, 256, 128, 64, 32, 16, 8, 4, 2, 1], packet_data_bits[n 8:(n+1) 8].T # 使用转置来进行矩阵乘法 ) ```
这段代码首先将
packet_data
解码为位数组,然后使用矩阵乘法计算
Samples
。
修改后的 Python 代码:
import numpy as np
import datetime
# ... (其他代码保持不变)
with open(input_file_path, 'rb') as input_file, open(output_file_path, 'w') as output_file:
while True:
# 使用大端字节序读取 packet number
packet_number_bytes = np.fromfile(input_file, dtype='>u4', count=1)
if len(packet_number_bytes) < 1:
break
packet_number = packet_number_bytes[0] # 获取整数数值
# ... (读取 packet data bytes 的代码保持不变)
# 使用 unpackbits 解码 packet data
packet_data_bits = np.unpackbits(packet_data).reshape(packet_data_bytes_rows, -1)
Samples = np.zeros((num_samples, num_channels), dtype=int)
for n in range(num_samples):
Samples[n, :] = np.dot(
[2048, 1024, 512, 256, 128, 64, 32, 16, 8, 4, 2, 1],
packet_data_bits[n*8:(n+1)*8].T
)
# ... (其他代码保持不变)
通过以上修改,的 Python 代码应该可以正确读取和解码
.dat
文件,并生成与 MATLAB 代码相同的结果。