一个由kernel层驱动错误导致的错误,查明的过程比较曲折,因此记录一下
问题初探
测试命令: run cts -m CtsHardwareTestCases -t android.hardware.input.cts.tests.GamepadTestCase#testButtonA
报错如下:
1 2 3 4 5 6 7 8 9 10 11 |
<Module name="CtsHardwareTestCases" abi="arm64-v8a" runtime="241587" done="true" pass="0"> <TestCase name="android.hardware.input.cts.tests.GamepadTestCase"> <Test result="fail" name="testButtonA"> <Failure message="java.lang.AssertionError: expected:<1> but was:<0> "> <StackTrace>java.lang.AssertionError: expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at android.hardware.input.cts.tests.InputTestCase.assertReceivedKeyEvent(InputTestCase.java:145) at android.hardware.input.cts.tests.GamepadTestCase.testButtonA(GamepadTestCase.java:41) |
测试case code如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
32 @Test 33 public void testButtonA() throws Exception { 34 registerInputDevice(R.raw.gamepad_register_device); 35 36 sendHidCommands(R.raw.gamepad_button_a_down); 37 sendHidCommands(R.raw.gamepad_delay); 38 assertReceivedKeyEvent(KeyEvent.ACTION_DOWN, KeyEvent.KEYCODE_BUTTON_A); 39 40 sendHidCommands(R.raw.gamepad_button_a_up); 41 assertReceivedKeyEvent(KeyEvent.ACTION_UP, KeyEvent.KEYCODE_BUTTON_A); 42 43 assertNoMoreEvents(); 44 } |
那么看出流程大致如下: 首先注册一个名为Odie (Test)的hid device,首先发送一个down事件,延迟10ms,检查测试界面是否接收到了down事件;然后再传递一个up事件,然后检查测试界面是否接收到了up事件
case栈报错的原因是应该接收到up事件时接收到的还是down事件,因此case报错
问题分析
(a)分析过程1:
首先直觉怀疑是不是case的时间设置问题导致发送的down事件被测试页面当作了长按,这个验证很简单,简单修改下case
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
206 private class InputListener implements InputCallback{ 207 @Override 208 public void onKeyEvent(KeyEvent ev){ 209 boolean done = false; 210 do { 211 try { 212 //mKeys.put(new KeyEvent(ev)); //改为: if(ev.getRepeatCount()==0){ mKeys.put(new KeyEvent(ev)); } 213 done = true; 214 } catch (InterruptedException ignore) { } 215 } while (!done); 216 } |
1 2 |
结果发现<span class="hljs-keyword">case</span>还是fail,报错变为不能得到up事件,因此这里就可以证明,cts的测试页面InputCtsActivity并没有接收到up事件,侧面证明下,我们在测试过之前打开/dev/input,执行getevent -ltr,发现: |
异常F1:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
add device 8: /dev/input/event7 name: "Odie (Test)" [ 24251.158735] /dev/input/event7: EV_LED LED_NUML 00000001 [ 24251.158735] /dev/input/event7: EV_MSC MSC_SCAN 00090001 [ 24251.158735] /dev/input/event7: EV_KEY BTN_GAMEPAD DOWN [ 24251.158735] /dev/input/event7: EV_ABS ABS_X 0000007f [ 24251.158735] /dev/input/event7: EV_ABS ABS_Y 0000007f [ 24251.158735] /dev/input/event7: EV_ABS ABS_Z 0000007f [ 24251.158735] /dev/input/event7: EV_ABS ABS_RZ 0000007f [ 24251.158735] /dev/input/event7: EV_SYN SYN_REPORT 00000000 rate 0 case 卡住,强制结束case [ 24416.424598] /dev/input/event7: EV_KEY BTN_GAMEPAD UP [ 24416.424612] /dev/input/event7: EV_SYN SYN_REPORT 00000001 rate 0 remove device 8: /dev/input/event7 |
正常F5:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
add device 8: /dev/input/event7 name: "Odie (Test)" [ 9500.572331] /dev/input/event7: EV_LED LED_NUML 00000001 [ 9500.572331] /dev/input/event7: EV_MSC MSC_SCAN 00090001 //hid码 [ 9500.572331] /dev/input/event7: EV_KEY BTN_GAMEPAD DOWN [ 9500.572331] /dev/input/event7: EV_ABS ABS_X 0000007f [ 9500.572331] /dev/input/event7: EV_ABS ABS_Y 0000007f [ 9500.572331] /dev/input/event7: EV_ABS ABS_Z 0000007f [ 9500.572331] /dev/input/event7: EV_ABS ABS_RZ 0000007f [ 9500.572331] /dev/input/event7: EV_SYN SYN_REPORT 00000000 (上报) //相当于一个按键结束 [ 9500.582252] /dev/input/event7: EV_MSC MSC_SCAN 00090001 [ 9500.582252] /dev/input/event7: EV_KEY BTN_GAMEPAD UP [ 9500.582252] /dev/input/event7: EV_SYN SYN_REPORT 00000000 rate 100 remove device 8: /dev/input/event7 |
可见,up事件在case运行过程中并没有被正常接收到,直到case fail,测试页面InputCtsActivity pause才发出了up事件
1 2 3 4 |
11-07 03:03:58.819 30370 30370 D LifecycleMonitor: Lifecycle status change: android.hardware.input.cts.InputCtsActivity@f112acb in: PAUSED 11-07 03:03:58.820 4792 17838 D ActivityTrigger: activityResumeTrigger: The activity in ApplicationInfo{9d04809 com.android.launcher3} is now in focus and seems to be in full-screen mode 11-07 03:03:58.820 4792 17838 E ActivityTrigger: activityResumeTrigger: not whiteListedcom.android.launcher3/com.android.launcher3.Launcher/28 11-07 03:03:58.824 30370 30370 W ViewRootImpl[InputCtsActivity]: Cancelling event due to no window focus: KeyEvent { action=ACTION_UP, keyCode=KEYCODE_BUTTON_A, scanCode=304, metaState=0, flags=0xa8, repeatCount=0, eventTime=12083079, downTime=12082572, deviceId=14, source=0x501 } |
(b)分析过程2:
上面说到up事件没有被正常接收到,首先看一下底层驱动是否有正常上传up事件,首先在input event从内核上传的必经之路input_event加一些log
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
http://guard.pt.miui.com/opengrok/xref/v10-p-cepheus-dev/kernel/msm-4.14/drivers/input/input.c#510 void input_event(struct input_dev *dev, unsigned int type, unsigned int code, int value){ unsigned long flags; printk("weijuncheng input_event, dev = %s\n",dev->name); printk("weijuncheng input_event, type = %x\n",type); printk("weijuncheng input_event, code = %x\n",code); printk("weijuncheng input_event, value = %x\n",value); printk("---------------------------------------------------->a3"); dump_stack(); printk("---------------------------------------------------->a4"); if (is_event_supported(type, dev->evbit, EV_MAX)) { spin_lock_irqsave(&dev->event_lock, flags); printk("---------------------------------------------------->a5"); printk("weijuncheng input_event, dev = %s\n",dev->name); printk("weijuncheng input_event, type = %x\n",type); printk("weijuncheng input_event, code = %x\n",code); printk("weijuncheng input_event, value = %x\n",value); input_handle_event(dev, type, code, value); printk("---------------------------------------------------->a6"); spin_unlock_irqrestore(&dev->event_lock, flags); #ifdef CONFIG_LAST_TOUCH_EVENTS touch_press_release_events_collect(dev, type, code, value); #endif } } EXPORT_SYMBOL(input_event); |
相关log:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
11-08 03:16:25.894 root 0 0 I : ---------------------------------------------------->a3 11-08 03:16:25.894 root 0 0 I : CPU: 1 PID: 4803 Comm: HidDeviceHandle Tainted: G S 4.14.76-perf-g92d7574-dirty #3 11-08 03:16:25.894 root 0 0 I Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 CEPHEUS (DT) 11-08 03:16:25.894 root 0 0 I Call trace: 11-08 03:16:25.894 root 0 0 I : dump_backtrace+0x0/0x188 11-08 03:16:25.894 root 0 0 I : show_stack+0x18/0x20 11-08 03:16:25.894 root 0 0 I : dump_stack+0xc4/0x100 11-08 03:16:25.894 root 0 0 I : input_event+0x78/0x268 11-08 03:16:25.894 root 0 0 I : hidinput_hid_event+0x498/0x530 //加工为input系统可识别的event 11-08 03:16:25.894 root 0 0 I : hid_process_event+0x12c/0x168 11-08 03:16:25.894 root 0 0 I : hid_report_raw_event+0x540/0x5d8 11-08 03:16:25.894 root 0 0 I : hid_input_report+0x178/0x1a8 11-08 03:16:25.894 root 0 0 I : uhid_char_write+0x390/0x5b8 11-08 03:16:25.894 root 0 0 I : __vfs_write+0x44/0x138 //从用户空间读取输入数据 11-08 03:16:25.894 root 0 0 I : vfs_write+0xc8/0x188 11-08 03:16:25.894 root 0 0 I : SyS_write+0x54/0xb8 11-08 03:16:25.894 root 0 0 I : el0_svc_naked+0x34/0x38 11-08 03:16:25.894 root 0 0 I : ---------------------------------------------------->a4 11-08 03:16:25.894 root 0 0 I : ---------------------------------------------------->a5 11-08 03:16:25.894 root 0 0 I : weijuncheng input_event, dev = Odie (Test) 11-08 03:16:25.894 root 0 0 I : weijuncheng input_event, type = 4 11-08 03:16:25.894 root 0 0 I : weijuncheng input_event, code = 4 11-08 03:16:25.894 root 0 0 I : weijuncheng input_event, value = 90001 11-08 03:16:25.894 root 0 0 I : ---------------------------------------------------->a6 |
最后发现,并没有up input event上传的迹象,而down事件是正常上传的,那么传递event事件流程中出问题的概率就很小了,那么首先怀疑up事件是否就根本没有从用户空间传下去,那么先看传输流程
case是通过hid command命令进行事件传输的,我们找到相关位置http://guard.pt.miui.com/opengrok/xref/v10-p-cepheus-dev/frameworks/base/cmds/hid/src/com/android/commands/hid/Hid.java#46
整理关键代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
93 private void process(Event e){ 94 final int index = mDevices.indexOfKey(e.getId()); 95 if (index >= 0) { 96 Device d = mDevices.valueAt(index); 97 if (Event.COMMAND_DELAY.equals(e.getCommand())) { 98 d.addDelay(e.getDuration()); 99 } else if (Event.COMMAND_REPORT.equals(e.getCommand())) { 100 d.sendReport(e.getReport()); 101 } else { 102 if (Event.COMMAND_REGISTER.equals(e.getCommand())) { 103 error("Device id=" + e.getId() + " is already registered. Ignoring event."); 104 } else { 105 error("Unknown command \"" + e.getCommand() + "\". Ignoring event."); 106 } 107 } 108 } else if (Event.COMMAND_REGISTER.equals(e.getCommand())) { 109 registerDevice(e); 110 } else { 111 Log.e(TAG, "Unknown device id specified. Ignoring event."); 112 } 113 } |
frameworks/base/cmds/hid/src/com/android/commands/hid/Device.java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
72 public void sendReport(byte[] report){ 73 Message msg = mHandler.obtainMessage(MSG_SEND_REPORT, report); 74 // if two messages are sent at identical time, they will be processed in order received 75 mHandler.sendMessageAtTime(msg, mTimeToSend); 76 } 77 78 public void addDelay(int delay){ 79 mTimeToSend = Math.max(SystemClock.uptimeMillis(), mTimeToSend) + delay; 80 } 100 @Override 101 public void handleMessage(Message msg){ 102 switch (msg.what) { 103 case MSG_OPEN_DEVICE: 104 SomeArgs args = (SomeArgs) msg.obj; 105 mPtr = nativeOpenDevice((String) args.arg1, args.argi1, args.argi2, args.argi3, 106 (byte[]) args.arg2, new DeviceCallback()); 107 pauseEvents(); 108 break; 109 case MSG_SEND_REPORT: 110 if (mPtr != 0) { 111 nativeSendReport(mPtr, (byte[]) msg.obj); 112 } else { 113 Log.e(TAG, "Tried to send report to closed device."); 114 } 115 break; 116 case MSG_CLOSE_DEVICE: ... 128 default: 129 throw new IllegalArgumentException("Unknown device message"); 130 } 131 } |
frameworks/base/cmds/hid/jni/com_android_commands_hid_Device.cpp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
240static JNINativeMethod sMethods[] = { 241 { "nativeOpenDevice", 242 "(Ljava/lang/String;III[B" 243 "Lcom/android/commands/hid/Device$DeviceCallback;)J", 244 reinterpret_cast<void*>(openDevice) }, 245 { "nativeSendReport", "(J[B)V", reinterpret_cast<void*>(sendReport) }, 246 { "nativeCloseDevice", "(J)V", reinterpret_cast<void*>(closeDevice) }, 247}; 222static void sendReport(JNIEnv* env, jclass /* clazz */, jlong ptr, jbyteArray rawReport){ 223 size_t size; 224 std::unique_ptr<uint8_t[]> report = getData(env, rawReport, size); 225 uhid::Device* d = reinterpret_cast<uhid::Device*>(ptr); 226 if (d) { 227 d->sendReport(report.get(), size); 228 } else { 229 LOGE("Could not send report, Device* is null!"); 230 } 231} 159void Device::sendReport(uint8_t* report, size_t reportSize) { 160 struct uhid_event ev; 161 memset(&ev, 0, sizeof(ev)); 162 ev.type = UHID_INPUT2; 163 ev.u.input2.size = reportSize; 164 memcpy(&ev.u.input2.data, report, reportSize); 165 ssize_t ret = TEMP_FAILURE_RETRY(::write(mFd, &ev, sizeof(ev))); //在这里将相应event写到内核中 166 if (ret < 0 || ret != sizeof(ev)) { 167 LOGE("Failed to send hid event: %s", strerror(errno)); 168 } 169} |
1 2 |
通过加<span class="hljs-built_in">log</span>,发现up事件的handleMessage没有如期处理,那么up事件就没有被向下传递,问题的分析又被转回到上层,下面就要分析handleMessage没有被处理的原因 |
(c)分析过程3:
首先如果要继续分析,需要对Handler有个大致的了解,简单来说就是一个线程向另一个持有looper的线程发送消息,持有looper的线程通过loop()循环选取Message进行处理
那么只有两种可能: 1.消息没入队 2.入队没处理
首先看消息没入队的情况,本例中是通过sendMessageAtTime来向队列中投递消息的,大体流程
frameworks/base/core/java/android/os/Handler.java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
689 public boolean sendMessageAtTime(Message msg, long uptimeMillis){ 690 MessageQueue queue = mQueue; 691 if (queue == null) { 692 RuntimeException e = new RuntimeException( 693 this + " sendMessageAtTime() called with no mQueue"); 694 Log.w("Looper", e.getMessage(), e); 695 return false; 696 } 697 return enqueueMessage(queue, msg, uptimeMillis); 698 } 740 private boolean enqueueMessage(MessageQueue queue, Message msg, long uptimeMillis){ 741 msg.target = this; 742 if (mAsynchronous) { 743 msg.setAsynchronous(true); 744 } 745 return queue.enqueueMessage(msg, uptimeMillis); 746 } |
frameworks/base/core/java/android/os/MessageQueue.java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
boolean enqueueMessage(Message msg, long when){ 537 if (msg.target == null) { 538 throw new IllegalArgumentException("Message must have a target."); 539 } 540 if (msg.isInUse()) { 541 throw new IllegalStateException(msg + " This message is already in use."); 542 } 543 544 synchronized (this) { 545 if (mQuitting) { 546 IllegalStateException e = new IllegalStateException( 547 msg.target + " sending message to a Handler on a dead thread"); 548 Log.w(TAG, e.getMessage(), e); 549 msg.recycle(); 550 return false; 551 } 552 553 msg.markInUse(); 554 msg.when = when; 555 Message p = mMessages; 556 boolean needWake; 557 if (p == null || when == 0 || when < p.when) { 558 // New head, wake up the event queue if blocked. 559 msg.next = p; 560 mMessages = msg; 561 needWake = mBlocked; //如果入队前队列为空,或者入队Message为对头,需要唤醒queue 562 } else { 563 // Inserted within the middle of the queue. Usually we don't have to wake 564 // up the event queue unless there is a barrier at the head of the queue 565 // and the message is the earliest asynchronous message in the queue. 566 needWake = mBlocked && p.target == null && msg.isAsynchronous(); //如果当前队列中已经有其他消息,只有特殊情况下才需要wake queue (异步message) 567 Message prev; 568 for (;;) { 569 prev = p; 570 p = p.next; 571 if (p == null || when < p.when) { 572 break; 573 } 574 if (needWake && p.isAsynchronous()) { 575 needWake = false; 576 } 577 } 578 msg.next = p; // invariant: p == prev.next 579 prev.next = msg; //插入到适当位置 580 } 581 582 // We can assume mPtr != 0 because mQuitting is false. 583 if (needWake) { 584 nativeWake(mPtr); //唤醒队列 585 } 586 } 587 return true; 588 } |
我们注意到enqueueMessage是会返回是否入队成功的,那么问题就很简单了,在case里把是否入队成功打出来就行了,结果发现入队是成功的;那么就剩下第二种可能,消息虽然在队中,但没有被取出来执行,这里的流程在http://guard.pt.miui.com/opengrok/xref/v10-p-cepheus-dev/frameworks/base/core/java/android/os/Looper.java#137:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
137 public static void loop(){ 138 final Looper me = myLooper(); ... 142 final MessageQueue queue = me.mQueue; 143 ... 158 159 for (;;) { 160 Message msg = queue.next(); // might block ... 192 try { 193 msg.target.dispatchMessage(msg); 194 dispatchEnd = needEndTime ? SystemClock.uptimeMillis() : 0; 195 } finally { 196 if (traceTag != 0) { 197 Trace.traceEnd(traceTag); 198 } 199 } ... 233 msg.recycleUnchecked(); 234 } 235 } |
那么肯定是卡在Message msg = queue.next(); // might block ,这也解释了为什么测试报错之后会卡住不生成log的原因,直到触发了CTS框架的整体超时机制,那么下面看一下MessageQueue.next()的操作
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
Message next(){ 311 // Return here if the message loop has already quit and been disposed. 312 // This can happen if the application tries to restart a looper after quit 313 // which is not supported. 314 final long ptr = mPtr; 315 if (ptr == 0) { 316 return null; 317 } 318 319 int pendingIdleHandlerCount = -1; // -1 only during first iteration 320 int nextPollTimeoutMillis = 0; 321 for (;;) { 322 if (nextPollTimeoutMillis != 0) { 323 Binder.flushPendingCommands(); 324 } 325 326 nativePollOnce(ptr, nextPollTimeoutMillis); //nextPollTimeoutMillis 队列为空时一直等待,直到被唤醒(下一个Message入队) 327 328 synchronized (this) { 329 // Try to retrieve the next message. Return if found. 330 final long now = SystemClock.uptimeMillis(); 331 Message prevMsg = null; 332 Message msg = mMessages; 333 if (msg != null && msg.target == null) { 334 // Stalled by a barrier. Find the next asynchronous message in the queue. 335 do { 336 prevMsg = msg; 337 msg = msg.next; 338 } while (msg != null && !msg.isAsynchronous()); 339 } 340 if (msg != null) { 341 if (now < msg.when) { 342 // Next message is not ready. Set a timeout to wake up when it is ready. 343 nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE); //队列中有Message了,但最快需要执行的Message还没到时间 344 } else { 345 // Got a message. 346 mBlocked = false; 347 if (prevMsg != null) { 348 prevMsg.next = msg.next; 349 } else { 350 mMessages = msg.next; 351 } 352 msg.next = null; 353 if (DEBUG) Log.v(TAG, "Returning message: " + msg); 354 msg.markInUse(); 355 return msg; //到了时间后,取出Message,并将queue的状态改为blocked = false; 356 } 357 } else { 358 // No more messages. 359 nextPollTimeoutMillis = -1; //队列为空时一直阻塞,直到被唤醒 360 } 361 362 // Process the quit message now that all pending messages have been handled. 363 if (mQuitting) { 364 dispose(); 365 return null; 366 } 367 368 // If first time idle, then get the number of idlers to run. 369 // Idle handles only run if the queue is empty or if the first message 370 // in the queue (possibly a barrier) is due to be handled in the future. 371 if (pendingIdleHandlerCount < 0 372 && (mMessages == null || now < mMessages.when)) { 373 pendingIdleHandlerCount = mIdleHandlers.size(); 374 } 375 if (pendingIdleHandlerCount <= 0) { 376 // No idle handlers to run. Loop and wait some more. 377 mBlocked = true; //除非顺利返回了Message,否则 nextPollTimeoutMillis >0或-1,分别代表首个Message没到时间或队列为空,此时队列阻塞,并进入下一个等待 循环 378 continue; 379 } 380 381 if (mPendingIdleHandlers == null) { 382 mPendingIdleHandlers = new IdleHandler[Math.max(pendingIdleHandlerCount, 4)]; 383 } 384 mPendingIdleHandlers = mIdleHandlers.toArray(mPendingIdleHandlers); 385 } 386 387 // Run the idle handlers. 388 // We only ever reach this code block during the first iteration. 389 for (int i = 0; i < pendingIdleHandlerCount; i++) { 390 final IdleHandler idler = mPendingIdleHandlers[i]; 391 mPendingIdleHandlers[i] = null; // release the reference to the handler 392 393 boolean keep = false; 394 try { 395 keep = idler.queueIdle(); 396 } catch (Throwable t) { 397 Log.wtf(TAG, "IdleHandler threw exception", t); 398 } 399 400 if (!keep) { 401 synchronized (this) { 402 mIdleHandlers.remove(idler); 403 } 404 } 405 } 406 407 // Reset the idle handler count to 0 so we do not run them again. 408 pendingIdleHandlerCount = 0; 409 410 // While calling an idle handler, a new message could have been delivered 411 // so go back and look again for a pending message without waiting. 412 nextPollTimeoutMillis = 0; 413 } 414 } |
因为case卡住了,那么很方便我们抓trace,先判断卡在哪里,发现果然卡在nativePollOnce
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
"main" prio=10 tid=1 Waiting | group="main" sCount=1 dsCount=0 flags=1 obj=0x12c01ee8 self=0x7758e14c00 | sysTid=11568 nice=-8 cgrp=default sched=0/0 handle=0x77dec83548 | state=S schedstat=( 171782606 895523 56 ) utm=11 stm=5 core=7 HZ=100 | stack=0x7fde185000-0x7fde187000 stackSize=8MB | held mutexes= at java.lang.Object.wait(Native method) - waiting on <0x0d54e4d2> (a java.lang.Object) at com.android.commands.hid.Device.close(Device.java:107) - locked <0x0d54e4d2> (a java.lang.Object) at com.android.commands.hid.Hid.run(Hid.java:90) at com.android.commands.hid.Hid.main(Hid.java:60) at com.android.internal.os.RuntimeInit.nativeFinishInit(Native method) at com.android.internal.os.RuntimeInit.main(RuntimeInit.java:388) "HidDeviceHandler" prio=10 tid=10 Native | group="main" sCount=1 dsCount=0 flags=1 obj=0x12c31a98 self=0x7758f8f000 | sysTid=11583 nice=0 cgrp=default sched=0/0 handle=0x7751a484f0 | state=S schedstat=( 8710574 6102134 7 ) utm=0 stm=0 core=4 HZ=100 | stack=0x7751945000-0x7751947000 stackSize=1041KB | held mutexes= kernel: (couldn't read /proc/self/task/11583/stack) native: #00 pc 000000000007aacc /system/lib64/libc.so (__epoll_pwait+8) native: #01 pc 0000000000014d48 /system/lib64/libutils.so (android::Looper::pollInner(int)+148) native: #02 pc 0000000000014c18 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+60) native: #03 pc 00000000001271b8 /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44) at android.os.MessageQueue.nativePollOnce(Native method) at android.os.MessageQueue.next(MessageQueue.java:326) at android.os.Looper.loop(Looper.java:162) at android.os.HandlerThread.run(HandlerThread.java:65) |
队列为空时timeout为-1,会一直等待,但是等到下一个Message入队时就应该被激活啊,而我们前面已经验证了up Message是入队的,因此这里就矛盾了,所以肯定还有什么我们没考虑的地方
先看下nativePollOnce的等待流程
frameworks/base/core/jni/android_os_MessageQueue.cpp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
212static const JNINativeMethod gMessageQueueMethods[] = { 213 /* name, signature, funcPtr */ ... 216 { "nativePollOnce", "(JI)V", (void*)android_os_MessageQueue_nativePollOnce }, 217 { "nativeWake", "(J)V", (void*)android_os_MessageQueue_nativeWake }, ... 221}; 188static void android_os_MessageQueue_nativePollOnce(JNIEnv* env, jobject obj, 189 jlong ptr, jint timeoutMillis){ 190 NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr); 191 nativeMessageQueue->pollOnce(env, obj, timeoutMillis); 192} 107void NativeMessageQueue::pollOnce(JNIEnv* env, jobject pollObj, int timeoutMillis) { 108 mPollEnv = env; 109 mPollObj = pollObj; 110 mLooper->pollOnce(timeoutMillis); 111 mPollObj = NULL; 112 mPollEnv = NULL; ... 119} |
/system/core/libutils/Looper.cpp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
178int Looper::pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData) { 179 int result = 0; 180 for (;;) { 181 while (mResponseIndex < mResponses.size()) { 182 const Response& response = mResponses.itemAt(mResponseIndex++); 183 int ident = response.request.ident; 184 if (ident >= 0) { 185 int fd = response.request.fd; 186 int events = response.events; 187 void* data = response.request.data; 188#if DEBUG_POLL_AND_WAKE 189 ALOGD("%p ~ pollOnce - returning signalled identifier %d: " 190 "fd=%d, events=0x%x, data=%p", 191 this, ident, fd, events, data); 192#endif 193 if (outFd != NULL) *outFd = fd; 194 if (outEvents != NULL) *outEvents = events; 195 if (outData != NULL) *outData = data; 196 return ident; //针对回调函数为空的情况,ident值必为一个大于等于0的值 197 } 198 } 199 200 if (result != 0) { 201#if DEBUG_POLL_AND_WAKE 202 ALOGD("%p ~ pollOnce - returning result %d", this, result); 203#endif 204 if (outFd != NULL) *outFd = 0; 205 if (outEvents != NULL) *outEvents = 0; 206 if (outData != NULL) *outData = NULL; 207 return result; //根据pollInner的结果进行处理,实际上是针对设置了回调函数的情况,因为设置了回调函数,所以已经对发生的事件做了处理了,所以,不需要将发生事件的相关信息再返回 给调用者了 208 } 209 210 result = pollInner(timeoutMillis); 211 } 212} |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
214int Looper::pollInner(int timeoutMillis) { 215#if DEBUG_POLL_AND_WAKE 216 ALOGD("%p ~ pollOnce - waiting: timeoutMillis=%d", this, timeoutMillis); 217#endif 218 219 // Adjust the timeout based on when the next message is due. 220 if (timeoutMillis != 0 && mNextMessageUptime != LLONG_MAX) { 221 nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC); 222 int messageTimeoutMillis = toMillisecondTimeoutDelay(now, mNextMessageUptime); 223 if (messageTimeoutMillis >= 0 224 && (timeoutMillis < 0 || messageTimeoutMillis < timeoutMillis)) { 225 timeoutMillis = messageTimeoutMillis; //重新计算超时时间 226 } 227#if DEBUG_POLL_AND_WAKE 228 ALOGD("%p ~ pollOnce - next message in %" PRId64 "ns, adjusted timeout: timeoutMillis=%d", 229 this, mNextMessageUptime - now, timeoutMillis); 230#endif 231 } 232 233 // Poll. 234 int result = POLL_WAKE; 235 mResponses.clear(); 236 mResponseIndex = 0; 237 238 // We are about to idle. 239 mPolling = true; 240 241 struct epoll_event eventItems[EPOLL_MAX_EVENTS]; 242 int eventCount = epoll_wait(mEpollFd, eventItems, EPOLL_MAX_EVENTS, timeoutMillis); 243 244 // No longer idling. 245 mPolling = false; 246 247 // Acquire lock. 248 mLock.lock(); 249 250 // Rebuild epoll set if needed. 251 if (mEpollRebuildRequired) { 252 mEpollRebuildRequired = false; 253 rebuildEpollLocked(); 254 goto Done; 255 } 256 257 // Check for poll error. 258 if (eventCount < 0) { //返回值小于零,表示发生错误 259 if (errno == EINTR) { 260 goto Done; 261 } 262 ALOGW("Poll failed with an unexpected error: %s", strerror(errno)); 263 result = POLL_ERROR; 264 goto Done; 265 } 266 267 // Check for poll timeout. 268 if (eventCount == 0) { //eventCount为零,表示发生超时,因此直接跳转到Done 269#if DEBUG_POLL_AND_WAKE 270 ALOGD("%p ~ pollOnce - timeout", this); 271#endif 272 result = POLL_TIMEOUT; 273 goto Done; 274 } 275 276 // Handle all events. 277#if DEBUG_POLL_AND_WAKE 278 ALOGD("%p ~ pollOnce - handling events from %d fds", this, eventCount); 279#endif 280 281 for (int i = 0; i < eventCount; i++) { 282 int fd = eventItems[i].data.fd; 283 uint32_t epollEvents = eventItems[i].events; 284 if (fd == mWakeEventFd) { 285 if (epollEvents & EPOLLIN) { //通过wake唤醒队列 286 awoken(); 287 } else { 288 ALOGW("Ignoring unexpected epoll events 0x%x on wake event fd.", epollEvents); 289 } 290 } else { 291 ssize_t requestIndex = mRequests.indexOfKey(fd); 292 if (requestIndex >= 0) { 293 int events = 0; 294 if (epollEvents & EPOLLIN) events |= EVENT_INPUT; 295 if (epollEvents & EPOLLOUT) events |= EVENT_OUTPUT; 296 if (epollEvents & EPOLLERR) events |= EVENT_ERROR; 297 if (epollEvents & EPOLLHUP) events |= EVENT_HANGUP; 298 pushResponse(events, mRequests.valueAt(requestIndex)); //如果是其他FD发生事件,则根据Request构造Response,并push到Response数组 //当某个监控fd上发生事件后,就会把对应的Request取出来调用。 299 } else { 300 ALOGW("Ignoring unexpected epoll events 0x%x on fd %d that is " 301 "no longer registered.", epollEvents, fd); 302 } 303 } 304 } 305Done: ; 306 307 // Invoke pending message callbacks. 308 mNextMessageUptime = LLONG_MAX; 309 while (mMessageEnvelopes.size() != 0) { 310 nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC); 311 const MessageEnvelope& messageEnvelope = mMessageEnvelopes.itemAt(0); 312 if (messageEnvelope.uptime <= now) { 313 // Remove the envelope from the list. 314 // We keep a strong reference to the handler until the call to handleMessage 315 // finishes. Then we drop it so that the handler can be deleted *before* 316 // we reacquire our lock. 317 { // obtain handler 318 sp<MessageHandler> handler = messageEnvelope.handler; 319 Message message = messageEnvelope.message; 320 mMessageEnvelopes.removeAt(0); 321 mSendingMessage = true; 322 mLock.unlock(); 323 324#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS 325 ALOGD("%p ~ pollOnce - sending message: handler=%p, what=%d", 326 this, handler.get(), message.what); 327#endif 328 handler->handleMessage(message); 329 } // release handler 330 331 mLock.lock(); 332 mSendingMessage = false; 333 result = POLL_CALLBACK; 334 } else { 335 // The last message left at the head of the queue determines the next wakeup time. 336 mNextMessageUptime = messageEnvelope.uptime; 337 break; 338 } 339 } 340 341 // Release lock. 342 mLock.unlock(); 343 344 // Invoke all response callbacks. 345 for (size_t i = 0; i < mResponses.size(); i++) { 346 Response& response = mResponses.editItemAt(i); 347 if (response.request.ident == POLL_CALLBACK) { 348 int fd = response.request.fd; 349 int events = response.events; 350 void* data = response.request.data; 351#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS 352 ALOGD("%p ~ pollOnce - invoking fd event callback %p: fd=%d, events=0x%x, data=%p", 353 this, response.request.callback.get(), fd, events, data); 354#endif 355 // Invoke the callback. Note that the file descriptor may be closed by 356 // the callback (and potentially even reused) before the function returns so 357 // we need to be a little careful when removing the file descriptor afterwards. 358 int callbackResult = response.request.callback->handleEvent(fd, events, data);//回调其他fd定义的callback 359 if (callbackResult == 0) { 360 removeFd(fd, response.request.seq); 361 } 362 363 // Clear the callback reference in the response structure promptly because we 364 // will not clear the response vector itself until the next poll 365 response.request.callback.clear(); 366 result = POLL_CALLBACK; 367 } 368 } 369 return result; 370} |
Looper初始化时会初始化一个fd,**mWakeEventFd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); **
前面说道,当一个队列中没有Message时,会陷入无限等待,当新消息入队时,我们将MessageQueue重新唤醒:
enqueueMessage中
1 2 3 |
583 if (needWake) { 584 nativeWake(mPtr); 585 } |
1 2 3 4 5 6 7 8 9 |
194static void android_os_MessageQueue_nativeWake(JNIEnv* env, jclass clazz, jlong ptr){ 195 NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr); 196 nativeMessageQueue->wake(); 197} 121void NativeMessageQueue::wake() { 122 mLooper->wake(); 123} |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
398void Looper::wake() { 399#if DEBUG_POLL_AND_WAKE 400 ALOGD("%p ~ wake", this); 401#endif 402 403 uint64_t inc = 1; 404 ssize_t nWrite = TEMP_FAILURE_RETRY(write(mWakeEventFd, &inc, sizeof(uint64_t))); 405 if (nWrite != sizeof(uint64_t)) { 406 if (errno != EAGAIN) { 407 LOG_ALWAYS_FATAL("Could not write wake signal to fd %d: %s", 408 mWakeEventFd, strerror(errno)); 409 } 410 } 411} |
通过epoll机制自动唤醒mWakeEventFd,那么没有被唤醒说明needWake为false,如果MessageQueue为空,这个值只能根据当前是否block来决定,那么必然会wake,只能怀疑Messagequeue不为空
在根据这里,强烈怀疑设置了一个屏障Message导致queue没有被wakeup
1 |
needWake = mBlocked && p.target == <span class="hljs-keyword">null</span> && msg.isAsynchronous(); <span class="hljs-comment">//如果当前队列中已经有其他消息,只有特殊情况下才需要wake queue (异步message)</span> |
再看case:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
100 @Override 101 public void handleMessage(Message msg){ 102 switch (msg.what) { 103 case MSG_OPEN_DEVICE: 104 SomeArgs args = (SomeArgs) msg.obj; 105 mPtr = nativeOpenDevice((String) args.arg1, args.argi1, args.argi2, args.argi3, 106 (byte[]) args.arg2, new DeviceCallback()); 107 pauseEvents(); 143 private class DeviceCallback{ 144 public void onDeviceOpen(){ 145 mHandler.resumeEvents(); 146 } |
处理Message (MSG_OPEN_DEVICE)设置了屏障,在DeviceCallback中的onDeviceOpen被消除,如果这个没有回调,就会发生这个up消息入队却无法执行的现象,case中加log发现果然没有被回调。那么问题又回到底层,需要找出Callback没有被回调的原因。
(d)分析过程4:
首先看这个callback被调用的位置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
171int Device::handleEvents(int events) { 172 if (events & (ALOOPER_EVENT_ERROR | ALOOPER_EVENT_HANGUP)) { 173 LOGE("uhid node was closed or an error occurred. events=0x%x", events); 174 mDeviceCallback->onDeviceError(); 175 return 0; 176 } 177 struct uhid_event ev; 178 ssize_t ret = TEMP_FAILURE_RETRY(::read(mFd, &ev, sizeof(ev))); 179 if (ret < 0) { 180 LOGE("Failed to read from uhid node: %s", strerror(errno)); 181 mDeviceCallback->onDeviceError(); 182 return 0; 183 } 184 185 if (ev.type == UHID_OPEN) { 186 mDeviceCallback->onDeviceOpen(); 187 } 188 189 return 1; 190} 52static int handleLooperEvents(int /* fd */, int events, void* data){ 53 Device* d = reinterpret_cast<Device*>(data); 54 return d->handleEvents(events); 55} 133Device::Device(int32_t id, int fd, std::unique_ptr<DeviceCallback> callback) : 134 mId(id), mFd(fd), mDeviceCallback(std::move(callback)) { 135 ALooper* aLooper = ALooper_forThread(); 136 if (aLooper == NULL) { 137 LOGE("Could not get ALooper, ALooper_forThread returned NULL"); 138 aLooper = ALooper_prepare(ALOOPER_PREPARE_ALLOW_NON_CALLBACKS); 139 } 140 ALooper_addFd(aLooper, fd, 0, ALOOPER_EVENT_INPUT, handleLooperEvents, 141 reinterpret_cast<void*>(this)); 142} |
在这里add_Fd,然后在pollInner中进行处理
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
345 for (size_t i = 0; i < mResponses.size(); i++) { 346 Response& response = mResponses.editItemAt(i); 347 if (response.request.ident == POLL_CALLBACK) { 348 int fd = response.request.fd; 349 int events = response.events; 350 void* data = response.request.data; 351#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS 352 ALOGD("%p ~ pollOnce - invoking fd event callback %p: fd=%d, events=0x%x, data=%p", 353 this, response.request.callback.get(), fd, events, data); 354#endif 355 // Invoke the callback. Note that the file descriptor may be closed by 356 // the callback (and potentially even reused) before the function returns so 357 // we need to be a little careful when removing the file descriptor afterwards. //在这里进行callback的回调 358 int callbackResult = response.request.callback->handleEvent(fd, events, data); 359 if (callbackResult == 0) { 360 removeFd(fd, response.request.seq); 361 } 362 363 // Clear the callback reference in the response structure promptly because we 364 // will not clear the response vector itself until the next poll. 365 response.request.callback.clear(); 366 result = POLL_CALLBACK; 367 } 368 } 369 return result; 370} |
可见onDeviceOpen这个callback的回调与这个event UHID_OPEN息息相关,可见是没有从内核上传UHID_OPEN上来,我们看一下uhid.txt中的说明:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
47The first thing you should do is sending an UHID_CREATE2 event. This will 48register the device. UHID will respond with an UHID_START event. You can now 49start sending data to and reading data from UHID. However, unless UHID sends the 50UHID_OPEN event, the internally attached HID Device Driver has no user attached. 51That is, you might put your device asleep unless you receive the UHID_OPEN 52event. If you receive the UHID_OPEN event, you should start I/O. If the last 53user closes the HID device, you will receive an UHID_CLOSE event. This may be 54followed by an UHID_OPEN event again and so on. There is no need to perform 55reference-counting in user-space. That is, you will never receive multiple 56UHID_OPEN events without an UHID_CLOSE event. The HID subsystem performs 57ref-counting for you. 58You may decide to ignore UHID_OPEN/UHID_CLOSE, though. I/O is allowed even 59though the device may have no users. |
The first thing you should do is sending an UHID_CREATE2 event. This will register the device. UHID will respond with an UHID_START event.
从用户态发送UHID_CREATE2到内核态(UHID),然后内核态(UHID)返回UHID_START和UHID_OPEN到用户态
然后通过添加log,大致理清楚了流程
首先在http://guard.pt.miui.com/opengrok/xref/v10-p-dipper-dev/kernel/msm-4.9/drivers/hid/uhid.c#642中初始化工作队列:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
623static int uhid_char_open(struct inode *inode, struct file *file) 624{ 625 struct uhid_device *uhid; 626 627 uhid = kzalloc(sizeof(*uhid), GFP_KERNEL); 628 if (!uhid) 629 return -ENOMEM; 630 631 mutex_init(&uhid->devlock); 632 mutex_init(&uhid->report_lock); 633 spin_lock_init(&uhid->qlock); 634 init_waitqueue_head(&uhid->waitq); 635 init_waitqueue_head(&uhid->report_wait); 636 uhid->running = false; 637 INIT_WORK(&uhid->worker, uhid_device_add_worker); 638 639 file->private_data = uhid; 640 nonseekable_open(inode, file); 641 642 return 0; 643} 11-15 01:51:18.455 root 0 0 I : uhid_char_open+0xb4/0x110 11-15 01:51:18.455 root 0 0 I : misc_open+0x12c/0x168 11-15 01:51:18.455 root 0 0 I : chrdev_open+0x12c/0x200 11-15 01:51:18.455 root 0 0 I : do_dentry_open+0x12c/0x2e8 11-15 01:51:18.455 root 0 0 I : vfs_open+0x54/0x88 11-15 01:51:18.455 root 0 0 I : path_openat+0x91c/0x14d0 11-15 01:51:18.455 root 0 0 I : do_filp_open+0x74/0x120 11-15 01:51:18.455 root 0 0 I : do_sys_open+0x134/0x248 11-15 01:51:18.455 root 0 0 I : SyS_openat+0x10/0x18 11-15 01:51:18.455 root 0 0 I : el0_svc_naked+0x34/0x38 |
然后从用户态向内核态发送event UHID_CREATE2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
92Device* Device::open(int32_t id, const char* name, int32_t vid, int32_t pid, 93 std::unique_ptr<uint8_t[]> descriptor, size_t descriptorSize, 94 std::unique_ptr<DeviceCallback> callback) { 95 96 int fd = ::open(UHID_PATH, O_RDWR | O_CLOEXEC); 97 if (fd < 0) { 98 LOGE("Failed to open uhid: %s", strerror(errno)); 99 return nullptr; 100 } 101 102 struct uhid_event ev; 103 memset(&ev, 0, sizeof(ev)); 104 ev.type = UHID_CREATE2; 105 strncpy((char*)ev.u.create2.name, name, UHID_MAX_NAME_LENGTH); 106 memcpy(&ev.u.create2.rd_data, descriptor.get(), 107 descriptorSize * sizeof(ev.u.create2.rd_data[0])); 108 ev.u.create2.rd_size = descriptorSize; 109 ev.u.create2.bus = BUS_BLUETOOTH; 110 ev.u.create2.vendor = vid; 111 ev.u.create2.product = pid; 112 ev.u.create2.version = 0; 113 ev.u.create2.country = 0; 114 115 errno = 0; 116 ssize_t ret = TEMP_FAILURE_RETRY(::write(fd, &ev, sizeof(ev))); //向内核态写 117 if (ret < 0 || ret != sizeof(ev)) { 118 ::close(fd); 119 LOGE("Failed to create uhid node: %s", strerror(errno)); 120 return nullptr; 121 } 122 123 // Wait for the device to actually be created. 124 ret = TEMP_FAILURE_RETRY(::read(fd, &ev, sizeof(ev))); //等待内核态的返回结果 125 if (ret < 0 || ev.type != UHID_START) { 126 ::close(fd); 127 LOGE("uhid node failed to start: %s", strerror(errno)); 128 return nullptr; 129 } 130 return new Device(id, fd, std::move(callback)); 131} |
然后内核态监听到这个event后,开始uhid_dev_create2创建操作
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
734 case UHID_CREATE2: 735 ret = uhid_dev_create2(uhid, &uhid->input_buf); 736 break; 478static int uhid_dev_create2(struct uhid_device *uhid, 479 const struct uhid_event *ev) 480{ 481 struct hid_device *hid; 482 size_t rd_size, len; 483 void *rd_data; 484 int ret; 485 486 if (uhid->running) 487 return -EALREADY; 488 489 rd_size = ev->u.create2.rd_size; 490 if (rd_size <= 0 || rd_size > HID_MAX_DESCRIPTOR_SIZE) 491 return -EINVAL; 492 493 rd_data = kmemdup(ev->u.create2.rd_data, rd_size, GFP_KERNEL); 494 if (!rd_data) 495 return -ENOMEM; 496 497 uhid->rd_size = rd_size; 498 uhid->rd_data = rd_data; 499 500 hid = hid_allocate_device(); 501 if (IS_ERR(hid)) { 502 ret = PTR_ERR(hid); 503 goto err_free; 504 } 505 506 len = min(sizeof(hid->name), sizeof(ev->u.create2.name)) - 1; 507 strncpy(hid->name, ev->u.create2.name, len); 508 len = min(sizeof(hid->phys), sizeof(ev->u.create2.phys)) - 1; 509 strncpy(hid->phys, ev->u.create2.phys, len); 510 len = min(sizeof(hid->uniq), sizeof(ev->u.create2.uniq)) - 1; 511 strncpy(hid->uniq, ev->u.create2.uniq, len); 512 513 hid->ll_driver = &uhid_hid_driver; 514 hid->bus = ev->u.create2.bus; 515 hid->vendor = ev->u.create2.vendor; 516 hid->product = ev->u.create2.product; 517 hid->version = ev->u.create2.version; 518 hid->country = ev->u.create2.country; 519 hid->driver_data = uhid; 520 hid->dev.parent = uhid_misc.this_device; 521 522 uhid->hid = hid; 523 uhid->running = true; 524 525 /* Adding of a HID device is done through a worker, to allow HID drivers 526 * which use feature requests during .probe to work, without they would 527 * be blocked on devlock, which is held by uhid_char_write. 528 */ 529 schedule_work(&uhid->worker); 530 531 return 0; 532 533err_free: 534 kfree(uhid->rd_data); 535 uhid->rd_data = NULL; 536 uhid->rd_size = 0; 537 return ret; 538} |
注意这里schedule_work(&uhid->worker); 将工作结构变量添加到系统共享队列,等待执行,其工作处理函数为uhid_device_add_worker,在其中发送UHID_START和UHID_OPEN,大体流程如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
<4>[ 643.029505] weijuncheng ---------------------------------------------------->uhid_queue UHID_START start <4>[ 643.029510] CPU: 4 PID: 7339 Comm: kworker/4:0 Tainted: G W O 4.9.112-perf-g3dd9240-dirty #2 <4>[ 643.029511] Hardware name: Xiaomi Technologies, Inc. Perseus P3 v2.1 (DT) <4>[ 643.029519] Workqueue: events uhid_device_add_worker <4>[ 643.029522] Call trace: <4>[ 643.029527] [<ffffff96a8c8a808>] dump_backtrace+0x0/0x2ac <4>[ 643.029529] [<ffffff96a8c8ad34>] show_stack+0x14/0x1c <4>[ 643.029533] [<ffffff96a905c02c>] dump_stack+0x94/0xb4 <4>[ 643.029534] [<ffffff96a982a198>] uhid_hid_start+0xb8/0xe4 <4>[ 643.029536] [<ffffff96a98234a8>] hid_device_probe+0xfc/0x140 <4>[ 643.029539] [<ffffff96a93891e4>] driver_probe_device+0x24c/0x430 <4>[ 643.029541] [<ffffff96a93895b0>] __device_attach_driver+0xd4/0x148 <4>[ 643.029542] [<ffffff96a9386ad8>] bus_for_each_drv+0x60/0xb0 <4>[ 643.029543] [<ffffff96a9388d84>] __device_attach+0x100/0x17c <4>[ 643.029544] [<ffffff96a93897bc>] device_initial_probe+0x10/0x18 <4>[ 643.029545] [<ffffff96a93880e0>] bus_probe_device+0x90/0x98 <4>[ 643.029547] [<ffffff96a93858d0>] device_add+0x420/0x58c <4>[ 643.029548] [<ffffff96a9823104>] hid_add_device+0xe0/0x2a0 <4>[ 643.029550] [<ffffff96a98297f0>] uhid_device_add_worker+0x1c/0x58 <4>[ 643.029552] [<ffffff96a8cc8d8c>] process_one_work+0x168/0x45c <4>[ 643.029553] [<ffffff96a8cc90d4>] worker_thread+0x54/0x46c <4>[ 643.029555] [<ffffff96a8ccf838>] kthread+0xe0/0xf4 <4>[ 643.029556] [<ffffff96a8c82ee0>] ret_from_fork+0x10/0x30 <4>[ 643.029558] weijuncheng ---------------------------------------------------->uhid_queue UHID_START end |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
<4>[ 643.029762] weijuncheng ---------------------------------------------------->uhid_queue UHID_OPEN start <4>[ 643.029764] CPU: 4 PID: 7339 Comm: kworker/4:0 Tainted: G W O 4.9.112-perf-g3dd9240-dirty #2 <4>[ 643.029765] Hardware name: Xiaomi Technologies, Inc. Perseus P3 v2.1 (DT) <4>[ 643.029767] Workqueue: events uhid_device_add_worker <4>[ 643.029768] Call trace: <4>[ 643.029770] [<ffffff96a8c8a808>] dump_backtrace+0x0/0x2ac <4>[ 643.029772] [<ffffff96a8c8ad34>] show_stack+0x14/0x1c <4>[ 643.029774] [<ffffff96a905c02c>] dump_stack+0x94/0xb4 <4>[ 643.029775] [<ffffff96a982a9fc>] uhid_hid_open+0x6c/0x9c <4>[ 643.029776] [<ffffff96a9823854>] hidinput_open+0x1c/0x24 <4>[ 643.029779] [<ffffff96a958c000>] input_open_device+0x80/0xac <4>[ 643.029781] [<ffffff96a959163c>] input_leds_connect+0xd0/0x2ac <4>[ 643.029782] [<ffffff96a958ca8c>] input_attach_handler+0x19c/0x1fc <4>[ 643.029783] [<ffffff96a958cf24>] input_register_device+0x36c/0x4b4 <4>[ 643.029784] [<ffffff96a9823f50>] hidinput_connect+0x43c/0x3d18 <4>[ 643.029785] [<ffffff96a9822c98>] hid_connect+0x1f4/0x334 <4>[ 643.029786] [<ffffff96a98234bc>] hid_device_probe+0x110/0x140 <4>[ 643.029787] [<ffffff96a93891e4>] driver_probe_device+0x24c/0x430 <4>[ 643.029788] [<ffffff96a93895b0>] __device_attach_driver+0xd4/0x148 <4>[ 643.029789] [<ffffff96a9386ad8>] bus_for_each_drv+0x60/0xb0 <4>[ 643.029791] [<ffffff96a9388d84>] __device_attach+0x100/0x17c <4>[ 643.029792] [<ffffff96a93897bc>] device_initial_probe+0x10/0x18 <4>[ 643.029793] [<ffffff96a93880e0>] bus_probe_device+0x90/0x98 <4>[ 643.029794] [<ffffff96a93858d0>] device_add+0x420/0x58c <4>[ 643.029795] [<ffffff96a9823104>] hid_add_device+0xe0/0x2a0 <4>[ 643.029796] [<ffffff96a98297f0>] uhid_device_add_worker+0x1c/0x58 <4>[ 643.029798] [<ffffff96a8cc8d8c>] process_one_work+0x168/0x45c <4>[ 643.029799] [<ffffff96a8cc90d4>] worker_thread+0x54/0x46c <4>[ 643.029800] [<ffffff96a8ccf838>] kthread+0xe0/0xf4 <4>[ 643.029801] [<ffffff96a8c82ee0>] ret_from_fork+0x10/0x30 <4>[ 643.029802] weijuncheng ---------------------------------------------------->uhid_queue UHID_OPEN end |
然后排查这一段逻辑:
1 2 3 4 5 6 |
<4>[ 643.029775] [<ffffff96a982a9fc>] uhid_hid_open+0x6c/0x9c <4>[ 643.029776] [<ffffff96a9823854>] hidinput_open+0x1c/0x24 <4>[ 643.029779] [<ffffff96a958c000>] input_open_device+0x80/0xac <4>[ 643.029781] [<ffffff96a959163c>] input_leds_connect+0xd0/0x2ac <4>[ 643.029782] [<ffffff96a958ca8c>] input_attach_handler+0x19c/0x1fc <4>[ 643.029783] [<ffffff96a958cf24>] input_register_device+0x36c/0x4b4 |
<
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
1804int hid_hw_open(struct hid_device *hdev) 1805{ 1806 int ret; 1807 1808 ret = mutex_lock_killable(&hdev->ll_open_lock); 1809 if (ret) 1810 return ret; 1811 1812 if (!hdev->ll_open_count++) { 1813 ret = hdev->ll_driver->open(hdev); 1814 if (ret) 1815 hdev->ll_open_count--; 1816 } 1817 1818 mutex_unlock(&hdev->ll_open_lock); 1819 return ret; 1820} 1821EXPORT_SYMBOL_GPL(hid_hw_open); |
注意
1 2 |
1812 if (!hdev->ll_open_count++) { 1813 ret = hdev->ll_driver->open(hdev); |
而调用的driver->open里面
1 2 3 4 5 6 7 8 9 10 11 12 |
142static int uhid_hid_open(struct hid_device *hid) 143{ 144 struct uhid_device *uhid = hid->driver_data; 145 int retval = 0; 146 147 if (!hid->ll_open_count++) { 148 retval = uhid_queue_event(uhid, UHID_OPEN); 149 if (retval) 150 hid->ll_open_count--; 151 } 152 return retval; 153} |
1 2 |
147 if (!hid->ll_open_count++) { 148 retval = uhid_queue_event(uhid, UHID_OPEN); |
hid->ll_open_count这个值加了两次,导致无法进入retval = uhid_queue_event(uhid, UHID_OPEN),UHID_OPEN这个event无法向上传递,最终没有调用到定义的callback,导致messagequeue被阻塞,up消息无法发送,case fail,本地修改hid_hw_open为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
int hid_hw_open(struct hid_device *hdev) { /* int ret; ret = mutex_lock_killable(&hdev->ll_open_lock); if (ret) return ret; if (!hdev->ll_open_count++) { ret = hdev->ll_driver->open(hdev); if (ret) hdev->ll_open_count--; } mutex_unlock(&hdev->ll_open_lock); return ret; */ return hdev->ll_driver->open(hdev); } |
再编bootimage刷一下,就可以通过测试了
问题总结
这个问题是kernel更新之后里面的hid driver实现逻辑有问题造成的,中间回顾了Handler实现逻辑,用到了linux的epoll,workqueue机制(只有基本了解),有待后续进一步详细了解。