In this section, to further assess the robustness of the TBNet, different attacks including JPEG compression and scaling, are first used to the testing images from the CASIA1.0 and Carvalho datasets, respectively, and then the performance of TBNet is assessed on these two datasets. Their results are shown in Figs. 6 and 7. We note that in the scaling attacks, the scaling ratios of 0.7 and 0.5 are used in our experiments, and the JPEG compression consists of quality factors of 70 and 50. MCC CASIA and MCC carvalho indicate the MCC performance of the TBNet when the CASIA1.0 and Carvalho datasets are separately employed. F1 CASIA and F1 carvalho indicate the F1 performance of the TBNet when the CASIA1.0 and Carvalho datasets are separately used. It is observed from Fig. 6 that the MCC and F1 lines are essentially a straight line when the scaling attack is used so that the scaling attack has almost no effect on the TBNet. Similarly, when the JPEG compression attack is utilized in Fig. 7, the MCC and F1 lines show a slight downward trend, indicating that this attack degrades the performance of TBNet slightly, but it still can mine the remaining tampering traces from all of the channels. We think this is because some high-frequency information is lost in the jpeg compression process, and this information is very important for detecting the tampering image. Thus, these experimental results prove the stability of TBNet.
Following previous studies [3, 6, 11], we evaluate the model robustness against four image post-processing methods, Gaussian blur, Gaussian noise, JPEG compression and ISO noise over NIST dataset to verify the robustness of MSMGNet. The detailed results of robustness analysis are shown in Fig. 8. For each post-processing method, we vary the kernel size in Gaussian blur (from 3 to 9), variance of Gaussian noise (from 3 to 9), quality in JPEG compression (from 50 to 100), and variance of ISO noise (from 0.05 to 0.2) for comprehensive evaluation. As can be observed, Gaussian blur affects the detection performance more severely, in particular when a larger kernel size of 9×9, which blurs images and erases manipulation traces around tampered regions. In addition, compared with other baselines, MSMG-Net achieves the most general robust performance on ISO noise. The results of MSMG-Net is owed to our multi-scale multi-grained learning, where a parallel partial shunted transformer block designed to learn coarse-to-fine manipulation segmentation features. In summary, our model, MSMG-Net, consistently performs the best among all methods and can effectively tackle the challenges brought by various post-processing methods.
In this subsection, we evaluate the robustness of different methods by considering several common types of postprocessing. To this end, we enlarged the PS-scripted bookcover dataset by applying resizing, cropping, and noise adding to the generated tampered images with different factors and saving them with different JPEG qualities. Subsequently, we trained models with the enlarged dataset and performed testing on the PS-boundary and PS-arbitrary datasets. Three methods, i.e., Bayar’s 64 × 64, Forensic Similarity, and Mantra-net, were included for comparisons, since they perform relatively well in the experiments conducted in Section IV-C and IV-D.
As we all know, in real-world scenarios, different tampered images are usually subjected to different JPEG compressions. Training a specific ReLoc model for each JPEG compression under investigation is impractical because it would be timeconsuming and the implementations of JPEG compression would vary from manufacturers and software. Therefore, in this subsection, we evaluate the robustness against different JPEG compressions by using a single model. The average improvements for QF 70 are relatively slighter, which are 0.042, 0.037, and 0.021 for DFCN, SCSE-Unet, and MVSSnet, respectively. When the testing QF is unseen in the training phase (i.e., QF 60), the average improvements achieved by ReLoc for DFCN, SCSE-Unet, and MVSS-net are 0.035, 0.010, and 0.012, respectively, which are still considerable. These experimental results indicate that ReLoc is also effective for improving the robustness against multiple JPEG compressions.
Experimental results under JPEG Compression Attack. It is well-known that the JPEG compression process is lossy and leads to modified pixel values and information loss due to rounding errors. The comparative experimental results of the JPEG compression attack are presented in Figure 8(b). The results demonstrate that as the quality factor decreases from 100 to 50, the F1-Score of other deep learning detection methods experiences a significant decline, while ASGC-Net maintains its performance stability. Our module’s JPEG artifacts removal capability can effectively mitigate the impact of JPEG compression, restoring and enhancing tampered region traces, which significantly improves the performance.
Experimental Results under Image Blending. While image blending is a widely-used technique for post-processing tampered images, its robustness has not been comprehensively studied in recent research. To generate realistic tampered images, we employ image blending, which seamlessly merges overlapping areas from different images. This method ensures that the transformed pixels blend with the original image, obscuring boundaries and reducing color differences. Hence, it makes distinguishing between authentic and tampered images more challenging.
Experimental results under scaling attack. Scaling down an image rseduces its main features, making it challenging to identify tampering. Therefore, existing methods are often ineffective against scaling attacks, particularly image shrinking attacks. Specifically, we employ a range of scaling factors, varying from 0.6 to 1.4 with a step of 0.1, to resize the images.
The majority of images are vulnerable to unpredictability arising from content modifications or geometric distortions, such as compression, noise, and resizing. Therefore, it is essential for image tamper detection algorithms to consider the resilience of these alterations. To assess the effectiveness of our method on In-The-Wild and Carvalho datasets, we subjected the test images to various attacks, including JPEG compression, image scaling, and image blending.
The proposed method is capable of automatically adjusting the perceptual field size based on the shape of the tampered target, in addition to exhibiting significant resistance against external compression, noise, and scaling interference.