首页 > 其他分享 >Android上层WatchDog学习笔记_2

Android上层WatchDog学习笔记_2

时间:2023-09-27 14:33:45浏览次数:40  
标签:java thread 笔记 线程 Watchdog WatchDog Android null public

一、简述

1. 了解 WatchDog 的原理,可以更好的理解系统服务的运行机制。


二、WatchDog实现

1. 代码实现位置

//frameworks/base/services/core/java/com/android/server/Watchdog.java

public class Watchdog extends Thread {
    ...
}

可见 Watchdog 是一个线程。

2. WatchDog 在 SystemServer.java 中启动

run() //SystemServer.java
    startBootstrapServices() //SystemServer.java
        traceBeginAndSlog("StartWatchdog");
        final Watchdog watchdog = Watchdog.getInstance();
        watchdog.start();
        traceEnd();
        ...
        traceBeginAndSlog("InitWatchdog");
        watchdog.init(mSystemContext, mActivityManagerService);
        traceEnd();

可见 Watchdog 是运行在 SystemServer 中的一个辅线程。因为是线程,所以,只要start即可。

3. WatchDog构造方法

private Watchdog() {
    super("watchdog");
    // not checking the background thread,shared foreground thread is the main checker. 线程名 "android.fg"
    mMonitorChecker = new HandlerChecker(FgThread.getHandler(), "foreground thread", DEFAULT_TIMEOUT);
    mHandlerCheckers.add(mMonitorChecker);
    // Add checker for main thread. only do a quick check since there can be UI running on the thread.
    mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()), "main thread", DEFAULT_TIMEOUT));
    // Add checker for shared UI thread. 线程名 "android.ui"
    mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(), "ui thread", DEFAULT_TIMEOUT));
    // And also check IO thread. 线程名 "android.io"
    mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(), "i/o thread", DEFAULT_TIMEOUT));
    // And the display thread. 线程名 "android.display"
    mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(), "display thread", DEFAULT_TIMEOUT));
    // And the animation thread. 线程名 "android.anim"
    mHandlerCheckers.add(new HandlerChecker(AnimationThread.getHandler(), "animation thread", DEFAULT_TIMEOUT));
    // And the surface animation thread. 线程名 "android.anim.lf"
    mHandlerCheckers.add(new HandlerChecker(SurfaceAnimationThread.getHandler(), "surface animation thread", DEFAULT_TIMEOUT));

    // Initialize monitor for Binder threads.
    addMonitor(new BinderThreadMonitor());
    mOpenFdMonitor = OpenFdMonitor.create();

    HandlerThread handlerThread = new HandlerThread("workThread"); //SS下的"workThread"线程
    handlerThread.start();
    mWorkHandler = new Handler(handlerThread.getLooper()) {
        @Override
        public void handleMessage(Message msg) {
            switch (msg.what) {
                case MESSAGE_AFE_CHECK_ERROR:
                    checkAfeStatus(false);
                    break;

                case MESSAGE_AFE_CHECK_OVER:
                    Slog.i(TAG, "release observer");
                    mFileObserver.stopWatching();
                    mFileObserver = null;
                    checkAfeStatus(true);
                    getLooper().quitSafely();
                    mWorkHandler = null;
                    break;
            }
        }
    };

    // See the notes on DEFAULT_TIMEOUT.
    assert DB || DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;
}

重点关注两个对象:mMonitorChecker 和 mHandlerCheckers。

其中 mHandlerCheckers 列表元素的来源:

(1)构造对象的导入:UiThread、IoThread、DisplatyThread、FgThread加入

(2)外部导入:Watchdog.getInstance().addThread(handler);

mMonitorChecker 列表元素的来源:

(1) 外部导入:Watchdog.getInstance().addMonitor(monitor);

(2) 特别说明:addMonitor(new BinderThreadMonitor());


3. WatchDog的run()方法

public void run() {
    while (true) {
        ...
        synchronized (this) {
            for (int i=0; i<mHandlerCheckers.size(); i++) {
                HandlerChecker hc = mHandlerCheckers.get(i);
                hc.scheduleCheckLocked();
            }            
        }
        ...
    }
    ...
    // Trigger the kernel to dump all blocked threads, and backtraces
    // on all CPUs to the kernel log
    doSysRq('w');
    doSysRq('l');
    ...
    Thread dropboxThread = new Thread("watchdogWriteToDropbox")
    dropboxThread.start();
    ...
}

对 mHandlerCheckers 列表元素进行检测,若发现卡住了,触发 show-backtrace-all-active-cpus(l) show-blocked-tasks(w) 这两个sysrq来获取active cpu和D状态线程的栈回溯。


4. HandlerChecker 的 scheduleCheckLocked()

public void scheduleCheckLocked() {
    if (mCompleted) {
        // Safe to update monitors in queue, Handler is not in the middle of work
        mMonitors.addAll(mMonitorQueue);
        mMonitorQueue.clear();
    }
    if ((mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) || (mPauseCount > 0)) {
        mCompleted = true;
        return;
    }
    if (!mCompleted) {
        // we already have a check in flight, so no need
        return;
    }

    mCompleted = false;
    mCurrentMonitor = null;
    mStartTime = SystemClock.uptimeMillis();
    mHandler.postAtFrontOfQueue(this);
}

mMonitors.size() == 0 的情況,主要为了检查 mHandlerCheckers 中的元素是否超时,运用的手段:mHandler.getLooper().getQueue().isPolling().

mMonitorChecker 对象的列表元素一定是大于0,此时,关注点在 mHandler.postAtFrontOfQueue(this):


5. HandlerChecker 的 run()

public final class HandlerChecker implements Runnable {
    ...
    @Override
    public void run() {
        final int size = mMonitors.size();
        for (int i = 0 ; i < size ; i++) {
            synchronized (Watchdog.this) {
                mCurrentMonitor = mMonitors.get(i);
            }
            mCurrentMonitor.monitor();
        }

        synchronized (Watchdog.this) {
            mCompleted = true;
            mCurrentMonitor = null;
        }
    }
    ...
}

运用的手段,监听 monitor 方法。

(1) 这里是对 mMonitors 进行 monitor,而能够满足条件的只有:mMonitorChecker,例如,各种服务通过 addMonitor 加入列表。

Watchdog.getInstance().addMonitor(this); //ActivityManagerService.java
Watchdog.getInstance().addMonitor(this); //InputManagerService.java
Watchdog.getInstance().addMonitor(this); //PowerManagerService.java
Watchdog.getInstance().addMonitor(this); //WindowManagerService.java

而被执行的 monitor 方法很简单,例如 ActivityManagerService 的:

public void monitor() {
    synchronized (this) { }
}

这里仅仅是检查系统服务是否长时间被锁住。

(2) 特别说明,检查 BinderThreadMonitor 方法

private static final class BinderThreadMonitor implements Watchdog.Monitor {
    @Override
    public void monitor() {
        Binder.blockUntilThreadAvailable();
    }
}

//frameworks/base/core/java/android/os/Binder.java
public static final native void blockUntilThreadAvailable();

//frameworks/native/libs/binder/IPCThreadState.cpp
void IPCThreadState::blockUntilThreadAvailable()
{
    pthread_mutex_lock(&mProcess->mThreadCountLock);
    while (mProcess->mExecutingThreadsCount >= mProcess->mMaxThreads) {
        ALOGW("Waiting for thread to be free. mExecutingThreadsCount=%lu mMaxThreads=%lu\n",
                static_cast<unsigned long>(mProcess->mExecutingThreadsCount),
                static_cast<unsigned long>(mProcess->mMaxThreads));
        pthread_cond_wait(&mProcess->mThreadCountDecrement, &mProcess->mThreadCountLock);
    }
    pthread_mutex_unlock(&mProcess->mThreadCountLock);
}

这里仅仅是检查进程中包含的可执行线程的数量不能超过 mMaxThreads,如果超过了最大值(31个),就需要等待。默认每个进程最大15个binder线程,但是SS将自己的改成31个了:

//frameworks/native/libs/binder/ProcessState.cpp
#define DEFAULT_MAX_BINDER_THREADS 15

//frameworks/base/services/java/com/android/server/SystemServer.java
public final class SystemServer {
    private static final int sMaxBinderThreads = 31;

    private void run() {
        BinderInternal.setMaxThreads(sMaxBinderThreads); //在启动所有服务之前就设置了
        ...
        startBootstrapServices();
    ]
}


6. 超时后WatchDog会做什么

private void checkAfeStatus(boolean success) {
    public void run() {
        ...
        Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
        WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
        Slog.w(TAG, "*** GOODBYE!");
        Process.killProcess(Process.myPid());
        System.exit(10);
}

kill自己所在进程(system_server),并退出。


三、WatchDog日志打印

1. process stack traces

保存路径由 dalvik.vm.stack-trace-file 或 dalvik.vm.stack-trace-dir 控制,常规为 /data/anr 。调用 ActivityManagerService.dumpStackTraces() 进行打印。

public final class HandlerChecker implements Runnable { //Watchdog.java
    public void run() {
        while (true) {
            if (!fdLimitTriggered) {
                if (waitState == WAITED_HALF) {
                    if (!waitedHalf) {
                        Slog.i(TAG, "WAITED_HALF");
                        // We've waited half the deadlock-detection interval.  Pull a stack
                        // trace and wait another half.
                        ArrayList<Integer> pids = new ArrayList<Integer>();
                        pids.add(Process.myPid());
                        ActivityManagerService.dumpStackTraces(pids, null, null, getInterestingNativePids());
                    }
                }
            }
            final File stack = ActivityManagerService.dumpStackTraces(pids, null, null, getInterestingNativePids());
        }
    }
}

注意,堵塞一半时即 WAITED_HALF,也会打印 process stack traces。


2. slog

Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);

Slog.w(TAG, "*** GOODBYE!");


3. event log

EventLog.writeEvent(EventLogTags.WATCHDOG, subject);


4. kernel stack traces

// Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log
doSysRq('w');
doSysRq('l');

触发 show-backtrace-all-active-cpus(l) show-blocked-tasks(w) 这两个sysrq来获取active cpu和D状态线程的栈回溯,打印到内核log中。


5. dropbox

Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
    public void run() {
        // If a watched thread hangs before init() is called, we don't have a
        // valid mActivity. So we can't log the error to dropbox.
        if (mActivity != null) {
            mActivity.addErrorToDropBox("watchdog", null, "system_server", null, null, null, subject, null, stack, null);
        }
        StatsLog.write(StatsLog.SYSTEM_SERVER_WATCHDOG_OCCURRED, subject);
    }
};
dropboxThread.start();

注意,dropbox 一般放在 /data/system/dropbox 目录下,指定目录的位置是:

//frameworks/base/services/core/java/com/android/server/DropBoxManagerService.java

public DropBoxManagerService(final Context context) {
    this(context, new File("/data/system/dropbox"), FgThread.get().getLooper());
}

 

四、监测UiThread、IoThread、DisplatyThread、FgThread的原因

1. 这4个类,继承 ServiceThread,是单例模式。例如 UiThread.java

//frameworks/base/services/core/java/com/android/server/UiThread.java

public final class UiThread extends ServiceThread {

    private UiThread() {
        super("android.ui", Process.THREAD_PRIORITY_FOREGROUND, false /*allowIo*/);
    }

    @Override
    public void run() {
        // Make sure UiThread is in the fg stune boost group
        Process.setThreadGroup(Process.myTid(), Process.THREAD_GROUP_TOP_APP);
        super.run();
    }

    private static void ensureThreadLocked() {
        if (sInstance == null) {
            sInstance = new UiThread();
            sInstance.start();
            final Looper looper = sInstance.getLooper();
            looper.setTraceTag(Trace.TRACE_TAG_SYSTEM_SERVER);
            looper.setSlowLogThresholdMs(SLOW_DISPATCH_THRESHOLD_MS, SLOW_DELIVERY_THRESHOLD_MS);
            sHandler = new Handler(sInstance.getLooper());
        }
    }

    public static UiThread get() {
        synchronized (UiThread.class) {
            ensureThreadLocked();
            return sInstance;
        }
    }

    public static Handler getHandler() {
        synchronized (UiThread.class) {
            ensureThreadLocked();
            return sHandler;
        }
    }
}

(1) 通过 get() 获取对象。

(2) 通过 getHandler() 获取各自线程里面的 Handler 对象。

(3) 注意看,创建自身对象 ensureThreadLocked 的时候,就进行了 start 动作。也就是说,这个线程。在创建对象的时候就,就已经启动了。

其次,这四个类都继承 ServiceThread ,而 ServiceThread 继承 HandlerThread。我们重点关注线程中的 Handler,因为 AMS、WMS、PMS 等系统服务都涉及调用它们。

//frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

final class UiHandler extends Handler {
    public UiHandler() {
        super(com.android.server.UiThread.get().getLooper(), null, true);
    }

    @Override
    public void handleMessage(Message msg) {
        switch (msg.what) {
            case SHOW_ERROR_UI_MSG: 
            case SHOW_NOT_RESPONDING_UI_MSG: 
            case SHOW_STRICT_MODE_VIOLATION_UI_MSG:
            case WAIT_FOR_DEBUGGER_UI_MSG:
            case DISPATCH_PROCESSES_CHANGED_UI_MSG:
            case DISPATCH_PROCESS_DIED_UI_MSG:
            case DISPATCH_UIDS_CHANGED_UI_MSG:
            case DISPATCH_OOM_ADJ_OBSERVER_MSG:
        }
    }
} 

UiHandler 是直接获取的 UiThread 里面的 Looper。我们清楚一个线程一个 Looper,一个 MessageQueue,但是可以有多个 Handler.

我们看 handleMessage 里面的处理方式,说明并不一定是主线程才能更新Ui。(但是Android有说明必须主线程才能更新UI)。


2. 使用的场景差异

UiThread --> ActivityManagerService

DisplayThread --> WindowManagerService、InputManagerService、DisplayMangerService

IoThread --> PackageInstallerService、StorageManagerService、BluetoothManagerService

 

五、总结

1. Watchdog 的核心对象为 mHandlerCheckers 和 mMonitorChecker。

mHandlerCheckers:监控消息队列是否发生阻塞。

mMonitorChecker:监控系统核心服务是否发生长时间持锁。

mHandlerCheckers 的对象采用手段为通过 mHandler.getLooper().getQueue().isPolling() 判断是否超时;mMonitorChecker 通过 synchronized(this) 判断是否超时,其中特别注意,BinderThreadMonitor 主要是通过判断Binder线程是否超过了系统最大值来判断是否超时。

2. 超时之后,系统会打印一系列的日志,可以根据各种日志输出,进行有效分析。

3. 超时之后,Watchdog会杀掉自己的进程,也就是此时 system_server 进程的pid会变化。

 

 

 

参考:
android原理分析博客,Android WatchDog原理分析:https://blog.csdn.net/weixin_28543661/article/details/117344345

 

标签:java,thread,笔记,线程,Watchdog,WatchDog,Android,null,public
From: https://www.cnblogs.com/hellokitty2/p/17732653.html

相关文章

  • CPP面向对象笔记
    基本属性即在类中包含的一系列变量方法即在类中定义的一系列函数Public,PrivateandProtected在没有继承的情况下,private与protected效果相同即都无法在类外直接访问调用实在想要访问,加个函数就行public则可以随意访问调用static仅与类的整体全局有关不受具体哪......
  • 手机直播源码,Android 简单的弹框
    手机直播源码,Android简单的弹框   privatestaticString[]items=newString[]{      "拍照",      "从相册中选择",  }; AlertDialog.Builderbuilder=newAlertDialog.Builder(MainActivity.this)        .setTitle(......
  • 动态规划——矩阵优化DP 学习笔记
    动态规划——矩阵优化DP学习笔记前置知识:矩阵、矩阵乘法。矩阵乘法优化线性递推斐波那契数列在斐波那契数列当中,\(f_1=f_2=1\),\(f_i=f_{i-1}+f_{i-2}\),求\(f_n\)。而分析式子可以知道,求\(f_k\)仅与\(f_{k-1}\)和\(f_{k-2}\)有关;所以我们设矩阵\(F_......
  • 直播app开发搭建,Android studio 图片压缩
    直播app开发搭建,Androidstudio图片压缩获取图片目录 Filefile=Environment.getExternalStorageDirectory();//获取根路径storage/emulated/0Stringpath1=file.getPath()+"/Pictures/1655215651628.jpg";//Pictures文件夹下面的1655215651628.jpg图片名 ​例   ......
  • Android Sample 之 Tab 和 Navigation
    Sample中,Tab在上,Navigation在下,后者有图标。不理解为什么用不同的名称。之前没有区分,混淆模糊。 搜索发现有人在stackoverflow问。有人答曰:区别在于 Tab是同一主题,而 Navigation可用于不相关的主题。 Tabsareconsideredtoberelatedtoeachotherwhere......
  • Vue2.0 浅学笔记
    Vue是框架,也是生态。1.VueAPI风格选项式(Vue2)组合式(Vue3)2.入门node.js版本大于153.创建项目创建项目npminitvue@latest开发环境VScode+Volar4.基本语法1.文本插值仅能使用单一表达式使用JavaScript表达式每个绑定仅支持单一表达式,也就是一段能够被求值的J......
  • EMQX学习笔记:命令行工具
    本文更新于2023-02-28,使用EMQX4.4.3。目录emqxemqx_ctlemqx官方文档:https://www.emqx.io/docs/zh/v4.4/getting-started/command-line.htmlemqxconsole:控制台模式。emqxrestart:重启EMQX。emqxstart:启动EMQX。emqx_ctl官方文档:https://www.emqx.io/docs/zh/v4.4/adva......
  • MMU复习--Apple的学习笔记
    一,前言以前看过MMU,因为这是单片机OS中没有的,当时我记得理解的不是很清晰,包括MMU中哪部分是硬件的,哪部分是软件的都没有太搞清楚。由于看了一个自己写linux操作系统的视频,里面有介绍MMU,且演示了虚拟地址和物理地址的转换,此时我才深刻的理解了,所以在看qemu源码的内存管理前,我先复习......
  • MMU复习--Apple的学习笔记
    一,前言以前看过MMU,因为这是单片机OS中没有的,当时我记得理解的不是很清晰,包括MMU中哪部分是硬件的,哪部分是软件的都没有太搞清楚。由于看了一个自己写linux操作系统的视频,里面有介绍MMU,且演示了虚拟地址和物理地址的转换,此时我才深刻的理解了,所以在看qemu源码的内存管理前,我先复习下......
  • 《Java编程思想第四版》学习笔记31--关于Externalizable
    //:Blip3.java//Reconstructinganexternalizableobjectimportjava.io.*;importjava.util.*;classBlip3implementsExternalizable{inti;Strings;//NoinitializationpublicBlip3(){System.out.println("Blip3Constructor");//s,inoti......