1、操作系统定时器
操作系统定时器是由rt thread内核提供的一个定时功能,支持硬件定时器或软件定时器。最后在一个产品的使用了多个定时器来实现数据通信指示灯的功能,原理是创建一个周期定时器,创建一个单次定时器,当有数据通信时,启动周期定时器来控制灯闪烁,启动单次定时器来超时关闭灯和周期定时器。当有数据再次到来时重新启动,达到了闪灯的效果。
2、定时器卡死的问题
这个定时器由2个不同优先级的线程调用,在程序经过长时间运行后,会出现定时器卡死,即程序一直运行在rt_timer_start中ffor (; row_head[row_lvl] != timer_list[row_lvl].prev; row_head[row_lvl] = row_head[row_lvl]->next)循环中。
rt_err_t rt_timer_start(rt_timer_t timer)
{
unsigned int row_lvl;
rt_list_t *timer_list;
register rt_base_t level;
rt_list_t *row_head[RT_TIMER_SKIP_LIST_LEVEL];
unsigned int tst_nr;
static unsigned int random_nr;
/* timer check */
RT_ASSERT(timer != RT_NULL);
RT_ASSERT(rt_object_get_type(&timer->parent) == RT_Object_Class_Timer);
/* stop timer firstly */
level = rt_hw_interrupt_disable();
/* remove timer from list */
_rt_timer_remove(timer);
/* change status of timer */
timer->parent.flag &= ~RT_TIMER_FLAG_ACTIVATED;
RT_OBJECT_HOOK_CALL(rt_object_take_hook, (&(timer->parent)));
/*
* get timeout tick,
* the max timeout tick shall not great than RT_TICK_MAX/2
*/
RT_ASSERT(timer->init_tick < RT_TICK_MAX / 2);
timer->timeout_tick = rt_tick_get() + timer->init_tick;
#ifdef RT_USING_TIMER_SOFT
if (timer->parent.flag & RT_TIMER_FLAG_SOFT_TIMER)
{
/* insert timer to soft timer list */
timer_list = rt_soft_timer_list;
}
else
#endif
{
/* insert timer to system timer list */
timer_list = rt_timer_list;
}
row_head[0] = &timer_list[0];
for (row_lvl = 0; row_lvl < RT_TIMER_SKIP_LIST_LEVEL; row_lvl++)
{
for (; row_head[row_lvl] != timer_list[row_lvl].prev;
row_head[row_lvl] = row_head[row_lvl]->next)
{
struct rt_timer *t;
rt_list_t *p = row_head[row_lvl]->next;
/* fix up the entry pointer */
t = rt_list_entry(p, struct rt_timer, row[row_lvl]);
/* If we have two timers that timeout at the same time, it's
* preferred that the timer inserted early get called early.
* So insert the new timer to the end the the some-timeout timer
* list.
*/
if ((t->timeout_tick - timer->timeout_tick) == 0)
{
continue;
}
else if ((t->timeout_tick - timer->timeout_tick) < RT_TICK_MAX / 2)
{
break;
}
}
if (row_lvl != RT_TIMER_SKIP_LIST_LEVEL - 1)
row_head[row_lvl + 1] = row_head[row_lvl] + 1;
}
/* Interestingly, this super simple timer insert counter works very very
* well on distributing the list height uniformly. By means of "very very
* well", I mean it beats the randomness of timer->timeout_tick very easily
* (actually, the timeout_tick is not random and easy to be attacked). */
random_nr++;
tst_nr = random_nr;
rt_list_insert_after(row_head[RT_TIMER_SKIP_LIST_LEVEL - 1],
&(timer->row[RT_TIMER_SKIP_LIST_LEVEL - 1]));
for (row_lvl = 2; row_lvl <= RT_TIMER_SKIP_LIST_LEVEL; row_lvl++)
{
if (!(tst_nr & RT_TIMER_SKIP_LIST_MASK))
rt_list_insert_after(row_head[RT_TIMER_SKIP_LIST_LEVEL - row_lvl],
&(timer->row[RT_TIMER_SKIP_LIST_LEVEL - row_lvl]));
else
break;
/* Shift over the bits we have tested. Works well with 1 bit and 2
* bits. */
tst_nr >>= (RT_TIMER_SKIP_LIST_MASK + 1) >> 1;
}
timer->parent.flag |= RT_TIMER_FLAG_ACTIVATED;
/* enable interrupt */
rt_hw_interrupt_enable(level);
#ifdef RT_USING_TIMER_SOFT
if (timer->parent.flag & RT_TIMER_FLAG_SOFT_TIMER)
{
/* check whether timer thread is ready */
if ((soft_timer_status == RT_SOFT_TIMER_IDLE) &&
((timer_thread.stat & RT_THREAD_STAT_MASK) == RT_THREAD_SUSPEND))
{
/* resume timer thread to check soft timer */
rt_thread_resume(&timer_thread);
rt_schedule();
}
}
#endif
return RT_EOK;
}
查看此时的定时器链表,发现定时器链表rt_timer_list的最后一个节点指向了自己,变成了一个死链表,导致上面的for循环成为死循环,无法退出,程序程序表现为死机,没有任务响应。
3、解决办法
更新了4.0.3版本中的定时器rt_timer.c代码,经过查看代码与rt thread代码github中的更新纪录,分析是由于rt_timer_start函数在某种特别的情况下,被中断调用,即一个中断或是一个线程未调用退出rt_timer_start函数,另外一个线程又调用了rt_timer_start函数,导致出现的死链表。
经过详细的分析rt_timer.c的更新纪录,找到https://github.com/RT-Thread/rt-thread/issues/3800, 关于硬件定时器的线程安全问题 #3800的问题,看了里面的讨论,确认了是由我的程序中有2个线程都会调用启动一个定时器,并且rt_timer_start函数中有一行代码未关中断,导致执行被打断导致。至此是真正的找到了问题。
总结一下,这个问题发生的条件是十分苛刻,即需要同时满足两个条件:
- 线程A的rt_timer_start被线程B打断后,线程B又执行了rt_timer_start.
- 线程A的rt_timer_start和线程B的rt_timer_start操作的是同一个timer.
4、讨论截图
github上面的讨论有时看不到图片,顺便截图下来,留着以后查看。
5、总结
操作系统的线程安全真的很重要,有时发生函数线程不安全的话,在短时间的测试是无法发现问题,通过这个问题的修复对临界区保护有了更深的认识,线程之间的抢占也存在临界区保护的问题。
|