所有调用了lwIP API的线程都应该使用lwIP的sys_thread_new来创建。
mingdu.zheng at gmail dot com
解决办法
所有调用了lwIP API的线程都应该使用lwIP的sys_thread_new来创建。
问题现象
执行到 sys_sem_wait 函数的 **timeouts->next->time -= time_needed; ** 语句引起BusFault异常。检查后发现timeouts->next的值为0,也就是尝试向空指针写数据导致BusFault。
void
sys_sem_wait(sys_sem_t sem)
{
......
again:
timeouts = sys_arch_timeouts();
if (!timeouts || !timeouts->next) {
sys_arch_sem_wait(sem, 0);
} else {
......
if (time_needed == SYS_ARCH_TIMEOUT) {
......
goto again;
} else {
if (time_needed < timeouts->next->time) {
timeouts->next->time -= time_needed;
} else {
timeouts->next->time = 0;
}
}
}
}
问题分析
分析sys_sem_wait和sys_arch_timeouts的源代码可以得出结论:lwIP假设每个线程都有自己的定时器链表,在操作定时器链表时不会进行加锁解锁操作。如果使用sys_thread_new创建线程,那么sys_thread_new会为每个线程创建定时器链表,如果线程不是sys_thread_new创建的,那么会引用默认的定时器链表,当有两条以上线程都使用默认定时器链表时就会产生竞态问题,因为对定时器链表的访问并没有加锁。因此所有调用了lwIP API的线程都应该使用lwIP的sys_thread_new来创建,最多只能有一条线程不是使用sys_thread_new创建的。
// Returns a pointer to the per-thread sys_timeouts structure. In lwIP, each
// thread has a list of timeouts which is repressented as a linked list of
// sys_timeout structures. The sys_timeouts structure holds a pointer to a
// linked list of timeouts. This function is called by the lwIP timeout
// scheduler and must not return a NULL value.
//
// In a single thread sys_arch implementation, this function will simply return
// a pointer to a global sys_timeouts variable stored in the sys_arch module.
//
struct sys_timeouts *
sys_arch_timeouts(void)
{
cyg_handle_t handle;
struct lwip_thread *t;
handle = cyg_thread_self();
for (t = threads; t; t = t->next)
if (t->handle == handle)
return &(t->to);
return &to;
}