一个CTS框架自身引起的问题
问题初探
测试命令: run retry --retry 1 --shard-count 6 -s 45f971f -s 5b01edc -s 9485acea -s 9485acda -s 948bac82 -s 948bac62
报错堆栈:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
10-31 11:30:00 E/CommandScheduler: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.newNode(HashMap.java:1747) at java.util.HashMap.putVal(HashMap.java:631) at java.util.HashMap.put(HashMap.java:612) at java.util.HashSet.add(HashSet.java:220) at java.util.AbstractCollection.addAll(AbstractCollection.java:344) at com.android.tradefed.config.OptionSetter.setFieldValue(OptionSetter.java:452) at com.android.tradefed.config.OptionSetter.setFieldValue(OptionSetter.java:549) at com.android.tradefed.config.OptionCopier.copyOptions(OptionCopier.java:49) at com.android.tradefed.config.OptionCopier.copyOptionsNoThrow(OptionCopier.java:60) at com.android.tradefed.testtype.suite.ITestSuite.split(ITestSuite.java:662) at com.android.compatibility.common.tradefed.testtype.retry.RetryFactoryTest.split(RetryFactoryTest.java:184) at com.android.tradefed.invoker.shard.ShardHelper.shardTest(ShardHelper.java:230) at com.android.tradefed.invoker.shard.ShardHelper.shardConfig(ShardHelper.java:86) at com.android.tradefed.invoker.shard.StrictShardHelper.shardConfig(StrictShardHelper.java:51) at com.android.tradefed.invoker.InvocationExecution.shardConfig(InvocationExecution.java:149) at com.android.tradefed.invoker.TestInvocation.invoke(TestInvocation.java:656) at com.android.tradefed.command.CommandScheduler$InvocationThread.run(CommandScheduler.java:566) |
首先,这是一个频繁full gc,但是并没有回收多少内存造成的报错,我们将gc日志也打出来
尝试将tools中的cts-tradefed里面改为
1 |
java $RDBG_FLAG -Xmx4g -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails -cp ${JAR_PATH} -DCTS_ROOT=${CTS_ROOT} com.android.compatibility.common.tradefed.command.CompatibilityConsole "$@" |
重新复现,打出如下log:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
run retry --retry 1 --shard-count 6 -s 45f971f -s 5b01edc -s 9485acea -s 9485acda -s 948bac82 -s 948bac62 cts-tf > [GC (Allocation Failure) [PSYoungGen: 69229K->8168K(74752K)] 69237K->8256K(245760K), 0.0052029 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] [GC (Allocation Failure) [PSYoungGen: 72680K->8632K(74752K)] 72768K->8728K(245760K), 0.0058188 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] [GC (Allocation Failure) [PSYoungGen: 73144K->10232K(139264K)] 73240K->11797K(310272K), 0.0099151 secs] [Times: user=0.07 sys=0.00, real=0.01 secs] [GC (Allocation Failure) [PSYoungGen: 139256K->10216K(139264K)] 140821K->17685K(310272K), 0.0056568 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] [GC (Allocation Failure) [PSYoungGen: 139240K->17240K(272896K)] 146709K->24717K(443904K), 0.0097234 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] [GC (Allocation Failure) [PSYoungGen: 272216K->9303K(275968K)] 279693K->21077K(446976K), 0.0250198 secs] [Times: user=0.14 sys=0.01, real=0.02 secs] [GC (Allocation Failure) [PSYoungGen: 264279K->18535K(527872K)] 276053K->33526K(698880K), 0.0108739 secs] [Times: user=0.04 sys=0.02, real=0.01 secs] [GC (Allocation Failure) [PSYoungGen: 526439K->6775K(529920K)] 541430K->27020K(700928K), 0.0066831 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] [GC (Allocation Failure) [PSYoungGen: 514679K->21239K(832512K)] 534924K->41500K(1003520K), 0.0085360 secs] [Times: user=0.03 sys=0.02, real=0.01 secs] [GC (Allocation Failure) [PSYoungGen: 831735K->22703K(835072K)] 851996K->55692K(1006080K), 0.0237944 secs] [Times: user=0.11 sys=0.02, real=0.03 secs] [GC (Allocation Failure) [PSYoungGen: 833199K->2509K(1323520K)] 866188K->57568K(1494528K), 0.0079843 secs] [Times: user=0.04 sys=0.01, real=0.01 secs] [GC (Allocation Failure) [PSYoungGen: 1298893K->2224K(1324032K)] 1353952K->59283K(1495040K), 0.0044128 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC (Allocation Failure) [PSYoungGen: 1298608K->5632K(1369088K)] 1355667K->64823K(1540096K), 0.0037587 secs] [Times: user=0.01 sys=0.01, real=0.00 secs] [GC (Allocation Failure) [PSYoungGen: 1348096K->8516K(1369600K)] 1407287K->73123K(1540608K), 0.0105542 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] [GC (Allocation Failure) [PSYoungGen: 1350980K->26096K(1204736K)] 1415587K->304733K(1483776K), 0.1134238 secs] [Times: user=0.70 sys=0.10, real=0.11 secs] [Full GC (Ergonomics) [PSYoungGen: 26096K->7141K(1204736K)] [ParOldGen: 278637K->278802K(463360K)] 304733K->285943K(1668096K), [Metaspace: 19544K->19544K(1067008K)], 0.3347239 secs] [Times: user=2.19 sys=0.01, real=0.33 secs] [GC (Allocation Failure) [PSYoungGen: 1185765K->109552K(1288192K)] 1464567K->506492K(1751552K), 0.1187568 secs] [Times: user=0.78 sys=0.03, real=0.12 secs] [Full GC (Ergonomics) [PSYoungGen: 109552K->41822K(1288192K)] [ParOldGen: 396940K->463058K(751616K)] 506492K->504881K(2039808K), [Metaspace: 19544K->19544K(1067008K)], 0.7046475 secs] [Times: user=5.09 sys=0.02, real=0.71 secs] [GC (Metadata GC Threshold) [PSYoungGen: 407656K->69930K(1171456K)] 870714K->575073K(1923072K), 0.0845791 secs] [Times: user=0.50 sys=0.02, real=0.08 secs] [Full GC (Metadata GC Threshold) [PSYoungGen: 69930K->0K(1171456K)] [ParOldGen: 505142K->573390K(946176K)] 575073K->573390K(2117632K), [Metaspace: 33176K->33176K(1079296K)], 0.9029356 secs] [Times: user=6.67 sys=0.04, real=0.90 secs] [GC (Allocation Failure) [PSYoungGen: 1061888K->18008K(1229824K)] 1635278K->591406K(2176000K), 0.0101945 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] [GC (Metadata GC Threshold) [PSYoungGen: 847656K->24174K(1211392K)] 1421054K->597580K(2157568K), 0.0188445 secs] [Times: user=0.11 sys=0.00, real=0.01 secs] [Full GC (Metadata GC Threshold) [PSYoungGen: 24174K->0K(1211392K)] [ParOldGen: 573406K->586836K(1096192K)] 597580K->586836K(2307584K), [Metaspace: 55555K->55555K(1099776K)], 0.3028041 secs] [Times: user=2.13 sys=0.00, real=0.31 secs] [GC (Allocation Failure) [PSYoungGen: 1067008K->156134K(1223168K)] 1653844K->1121214K(2319360K), 0.1780735 secs] [Times: user=1.21 sys=0.11, real=0.18 secs] [Full GC (Ergonomics) [PSYoungGen: 156134K->19416K(1223168K)] [ParOldGen: 965079K->1096069K(1870336K)] 1121214K->1115486K(3093504K), [Metaspace: 70120K->70120K(1112064K)], 2.1907503 secs] [Times: user=11.86 sys=0.06, real=2.19 secs] [GC (Allocation Failure) [PSYoungGen: 1086424K->174561K(640512K)] 2182494K->1840018K(2510848K), 0.3123986 secs] [Times: user=2.16 sys=0.18, real=0.31 secs] [Full GC (Ergonomics) [PSYoungGen: 174561K->0K(640512K)] [ParOldGen: 1665456K->1832562K(2796544K)] 1840018K->1832562K(3437056K), [Metaspace: 70120K->70120K(1112064K)], 2.0454693 secs] [Times: user=8.06 sys=0.09, real=2.05 secs] [GC (Allocation Failure) [PSYoungGen: 465920K->325538K(931840K)] 2298482K->2158101K(3728384K), 0.0888857 secs] [Times: user=0.66 sys=0.00, real=0.09 secs] [GC (Allocation Failure) [PSYoungGen: 791458K->318690K(931840K)] 2624021K->2477357K(3728384K), 0.1941567 secs] [Times: user=1.38 sys=0.07, real=0.20 secs] [GC (Allocation Failure) [PSYoungGen: 784610K->324642K(931840K)] 2943277K->2802237K(3728384K), 0.1914071 secs] [Times: user=1.33 sys=0.13, real=0.19 secs] [Full GC (Ergonomics) [PSYoungGen: 324642K->3579K(931840K)] [ParOldGen: 2477595K->2796122K(2796544K)] 2802237K->2799702K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 4.2754003 secs] [Times: user=21.15 sys=0.16, real=4.27 secs] [Full GC (Ergonomics) [PSYoungGen: 469499K->317748K(931840K)] [ParOldGen: 2796122K->2796308K(2796544K)] 3265622K->3114057K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 2.1786764 secs] [Times: user=13.12 sys=0.02, real=2.18 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->421125K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262228K->3217434K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 1.8483104 secs] [Times: user=12.65 sys=0.02, real=1.85 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->452682K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262228K->3248990K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 1.8304428 secs] [Times: user=12.95 sys=0.03, real=1.84 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->453492K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262228K->3249801K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 1.8404874 secs] [Times: user=13.18 sys=0.01, real=1.84 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->459865K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262228K->3256173K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 5.8066516 secs] [Times: user=40.27 sys=0.05, real=5.81 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->460566K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262228K->3256874K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 1.8195868 secs] [Times: user=13.06 sys=0.00, real=1.82 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->460623K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262228K->3256931K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 1.6723615 secs] [Times: user=11.94 sys=0.02, real=1.67 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->464483K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262228K->3260791K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 1.8485084 secs] [Times: user=13.24 sys=0.00, real=1.85 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->464888K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262228K->3261196K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 1.8069882 secs] [Times: user=12.92 sys=0.00, real=1.81 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465369K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262228K->3261677K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 1.7872715 secs] [Times: user=12.79 sys=0.01, real=1.78 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465582K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262228K->3261890K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 1.5752262 secs] [Times: user=11.10 sys=0.01, real=1.57 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465733K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262228K->3262041K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 1.6507156 secs] [Times: user=11.71 sys=0.01, real=1.65 secs] [Full GC (Ergonomics) [PSYoungGen: 465907K->465771K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262215K->3262080K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 1.5695637 secs] [Times: user=11.09 sys=0.01, real=1.56 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465822K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262228K->3262130K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 1.5955533 secs] [Times: user=11.32 sys=0.00, real=1.60 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465887K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262228K->3262195K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 1.5837765 secs] [Times: user=11.24 sys=0.00, real=1.58 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465896K(931840K)] [ParOldGen: 2796308K->2796308K(2796544K)] 3262228K->3262204K(3728384K), [Metaspace: 70120K->70120K(1112064K)], 1.7875755 secs] [Times: user=12.81 sys=0.02, real=1.79 secs] [Full GC (Allocation Failure) [PSYoungGen: 465896K->454120K(931840K)] [ParOldGen: 2796308K->2796201K(2796544K)] 3262204K->3250322K(3728384K), [Metaspace: 70120K->70015K(1112064K)], 2.7476680 secs] [Times: user=16.59 sys=0.03, real=2.75 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->455662K(931840K)] [ParOldGen: 2796219K->2796179K(2796544K)] 3262139K->3251841K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 9.0061886 secs] [Times: user=64.63 sys=0.22, real=9.01 secs] [Full GC (Ergonomics) [PSYoungGen: 465917K->457887K(931840K)] [ParOldGen: 2796179K->2796163K(2796544K)] 3262096K->3254050K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.5904952 secs] [Times: user=11.19 sys=0.01, real=1.59 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->463660K(931840K)] [ParOldGen: 2796163K->2796163K(2796544K)] 3262083K->3259823K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.6110997 secs] [Times: user=11.38 sys=0.01, real=1.61 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465113K(931840K)] [ParOldGen: 2796163K->2796163K(2796544K)] 3262083K->3261276K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.6458507 secs] [Times: user=11.83 sys=0.01, real=1.64 secs] [Full GC (Ergonomics) [PSYoungGen: 465806K->465719K(931840K)] [ParOldGen: 2796163K->2796163K(2796544K)] 3261969K->3261883K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.6579255 secs] [Times: user=11.79 sys=0.01, real=1.66 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465720K(931840K)] [ParOldGen: 2796422K->2796420K(2796544K)] 3262342K->3262141K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 8.4967094 secs] [Times: user=60.98 sys=0.06, real=8.50 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465721K(931840K)] [ParOldGen: 2796420K->2796419K(2796544K)] 3262340K->3262140K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.8118774 secs] [Times: user=12.98 sys=0.00, real=1.81 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465593K(931840K)] [ParOldGen: 2796419K->2796419K(2796544K)] 3262339K->3262013K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.5768498 secs] [Times: user=11.15 sys=0.01, real=1.58 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465618K(931840K)] [ParOldGen: 2796419K->2796419K(2796544K)] 3262339K->3262037K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.7890317 secs] [Times: user=12.83 sys=0.00, real=1.79 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465641K(931840K)] [ParOldGen: 2796419K->2796419K(2796544K)] 3262339K->3262061K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.5872775 secs] [Times: user=11.27 sys=0.01, real=1.58 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465642K(931840K)] [ParOldGen: 2796419K->2796419K(2796544K)] 3262339K->3262061K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.5884750 secs] [Times: user=11.28 sys=0.02, real=1.59 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465675K(931840K)] [ParOldGen: 2796419K->2796419K(2796544K)] 3262339K->3262094K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.5860467 secs] [Times: user=11.21 sys=0.01, real=1.59 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465712K(931840K)] [ParOldGen: 2796419K->2796419K(2796544K)] 3262339K->3262131K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.5936147 secs] [Times: user=11.33 sys=0.01, real=1.59 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465709K(931840K)] [ParOldGen: 2796419K->2796419K(2796544K)] 3262339K->3262128K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.6066747 secs] [Times: user=11.41 sys=0.01, real=1.60 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465887K(931840K)] [ParOldGen: 2796419K->2796419K(2796544K)] 3262339K->3262307K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.5937984 secs] [Times: user=11.28 sys=0.01, real=1.60 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465919K(931840K)] [ParOldGen: 2796419K->2796419K(2796544K)] 3262339K->3262339K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.6830851 secs] [Times: user=11.99 sys=0.01, real=1.68 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465919K(931840K)] [ParOldGen: 2796419K->2796419K(2796544K)] 3262339K->3262339K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.5858877 secs] [Times: user=11.23 sys=0.01, real=1.58 secs] [Full GC (Ergonomics) [PSYoungGen: 465919K->465919K(931840K)] [ParOldGen: 2796421K->2796421K(2796544K)] 3262341K->3262341K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.6499274 secs] [Times: user=11.71 sys=0.01, real=1.66 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465919K(931840K)] [ParOldGen: 2796423K->2796423K(2796544K)] 3262343K->3262343K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.5922608 secs] [Times: user=11.29 sys=0.00, real=1.59 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465919K(931840K)] [ParOldGen: 2796423K->2796423K(2796544K)] 3262343K->3262343K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.5766683 secs] [Times: user=11.18 sys=0.02, real=1.57 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465920K(931840K)] [ParOldGen: 2796425K->2796425K(2796544K)] 3262345K->3262345K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.5808111 secs] [Times: user=11.21 sys=0.00, real=1.58 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465920K(931840K)] [ParOldGen: 2796427K->2796427K(2796544K)] 3262347K->3262347K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.6686178 secs] [Times: user=11.79 sys=0.01, real=1.68 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465920K(931840K)] [ParOldGen: 2796429K->2796429K(2796544K)] 3262349K->3262349K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.5804707 secs] [Times: user=11.21 sys=0.01, real=1.58 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465920K(931840K)] [ParOldGen: 2796431K->2796431K(2796544K)] 3262351K->3262351K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.5736846 secs] [Times: user=11.17 sys=0.01, real=1.57 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465920K(931840K)] [ParOldGen: 2796433K->2796433K(2796544K)] 3262353K->3262353K(3728384K), [Metaspace: 70019K->70019K(1112064K)], 1.6448586 secs] [Times: user=11.74 sys=0.00, real=1.65 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465903K(931840K)] [ParOldGen: 2796435K->2796435K(2796544K)] 3262355K->3262339K(3728384K), [Metaspace: 70019K->70015K(1112064K)], 1.5992428 secs] [Times: user=11.32 sys=0.00, real=1.59 secs] java.lang.OutOfMemoryError: GC overhead limit exceeded Dumping heap to java_pid14450.hprof ... Heap dump file created [5638377967 bytes in 32.001 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->465904K(931840K)] [ParOldGen: 2796435K->2796434K(2796544K)] 3262355K->3262338K(3728384K), [Metaspace: 70015K->70015K(1112064K)], 1.7980229 secs] [Times: user=12.93 sys=0.03, real=1.79 secs] [Full GC (Ergonomics) [PSYoungGen: 465920K->0K(931840K)] [ParOldGen: 2796434K->9739K(1840128K)] 3262354K->9739K(2771968K), [Metaspace: 70015K->70015K(1112064K)], 0.0714630 secs] [Times: user=0.14 sys=0.03, real=0.07 secs] cts-tf > 10-31 11:30:00 E/CommandScheduler: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.newNode(HashMap.java:1747) at java.util.HashMap.putVal(HashMap.java:631) at java.util.HashMap.put(HashMap.java:612) at java.util.HashSet.add(HashSet.java:220) at java.util.AbstractCollection.addAll(AbstractCollection.java:344) at com.android.tradefed.config.OptionSetter.setFieldValue(OptionSetter.java:452) at com.android.tradefed.config.OptionSetter.setFieldValue(OptionSetter.java:549) at com.android.tradefed.config.OptionCopier.copyOptions(OptionCopier.java:49) at com.android.tradefed.config.OptionCopier.copyOptionsNoThrow(OptionCopier.java:60) at com.android.tradefed.testtype.suite.ITestSuite.split(ITestSuite.java:662) at com.android.compatibility.common.tradefed.testtype.retry.RetryFactoryTest.split(RetryFactoryTest.java:184) at com.android.tradefed.invoker.shard.ShardHelper.shardTest(ShardHelper.java:230) at com.android.tradefed.invoker.shard.ShardHelper.shardConfig(ShardHelper.java:86) at com.android.tradefed.invoker.shard.StrictShardHelper.shardConfig(StrictShardHelper.java:51) at com.android.tradefed.invoker.InvocationExecution.shardConfig(InvocationExecution.java:149) at com.android.tradefed.invoker.TestInvocation.invoke(TestInvocation.java:656) at com.android.tradefed.command.CommandScheduler$InvocationThread.run(CommandScheduler.java:566) |
GC overhead limt exceed检查是Hotspot VM 1.6定义的一个策略,通过统计GC时间来预测是否要OOM了,提前抛出异常,防止OOM发生。Sun 官方对此的定义是:“并行/并发回收器在GC回收时间过长时会抛出OutOfMemroyError。过长的定义是,超过98%的时间用来做GC并且回收了不到2%的堆内存。用来避免内存过小造成应用不能正常工作。“
而我们发现在抛出异常之前,确实频繁出现了full gc的日志,下面选取两个典型日志看一下含义:
[GC (Allocation Failure) [PSYoungGen: 1350980K->26096K(1204736K)] 1415587K->304733K(1483776K), 0.1134238 secs] [Times: user=0.70 sys=0.10, real=0.11 secs]
这个属于minor gc的日志:
- GC表示这是一次Minor GC(新生代垃圾收集)
2.[PSYoungGen: 1350980K->26096K(1204736K)] PSYoungGen表示新生代使用的是多线程垃圾收集器Parallel Scavenge,1350980K为gc前新生代占用空间的大小,26096K表示gc后占用空间的大小,
1204736K为新生代总大小
注:HotSpot JVM把新生代分为了三部分:1个Eden区和2个Survivor区(分别叫from和to)。默认比例为8:1,为啥默认会是这个比例,接下来我们会聊到。一般情况下,新创建的对象都会被分配到Eden区(一些大对象特殊处理),这些对象经过第一次Minor GC后,如果仍然存活,将会被移到Survivor区。对象在Survivor区中每熬过一次Minor GC,年龄就会增加1岁,当它的年龄增加到一定程度时,就会被移动到年老代中。
3.1415587K->304733K(1483776K) 1415587K: gc前堆占用总空间 304733K: gc后堆占用总空间 1483776K:当前堆的总大小
由()中的两个值可以算出老年代的大小,如本例中为: 1483776K - 1204736K
4.0.1134238 secs 本次gc消耗时间
- [Times: user=0.70 sys=0.10, real=0.11 secs]
user- 表示在此次垃圾回收过程中,所有GC线程消耗的CPU时间之和
sys - 表示GC过程中操作系统调用和系统等等事件所消耗的时间
real - 应用暂停的总时间。在并行GC中,这个数值应该接近于(user + sys) / GC线程数。即单个核上平均的暂停时间 。由于某些过程是不能并行执行的,所以这个值会比刚才求的均值略高
[Full GC (Ergonomics) [PSYoungGen: 465920K->465903K(931840K)] [ParOldGen: 2796435K->2796435K(2796544K)] 3262355K->3262339K(3728384K), [Metaspace: 70019K->70015K(1112064K)], 1.5992428 secs] [Times: user=11.32 sys=0.00, real=1.59 secs]
代表full gc,大体的格式类同,以三个数字为一组,分别代表新生代,老年代,总堆以及元数据区;从上面的log可以看出,报错前的full gc耗时1.5s,但并没有回收多少空间,最后导致错误:
问题分析
从log中我们可以看到这一行log,Dumping heap to java_pid14450.hprof ... 那么就可以用MAT分析得到的hprof文件
从上面的图中可以看到CommandScheduler中的InvocationThread中的run方法中持有大量的局部变量会导致最终的gc overhead limit exceceeded.注意: Histogram是从类的角度来总结对上内存分配状况,Dominator Tree可以是从实例的角度来总结,Path To GC Roots 可以看到引用路径,导致对象没有被gc的原因
大致的引用路径如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
496 private class InvocationThread extends Thread{ ... 514 515 public InvocationThread(String name, IInvocationContext invocationContext, 516 ExecutableCommand command, IScheduledInvocationListener... listeners){ 517 // create a thread group so LoggerRegistry can identify this as an invocationThread 518 super(new ThreadGroup(name), name); 519 mListeners = listeners; 520 mInvocationContext = invocationContext; 521 mCmd = command; 522 mInvocation = createRunInstance(); 523 524 // Daemon timer 525 mExecutionTimer = new Timer(true); 526 mInvocationThreadMonitor = new InvocationThreadMonitor(this); 527 } 528 529 public long getStartTime(){ 530 return mStartTime; 531 } 532 533 @Override 534 public void run(){ ... 566 instance.invoke(mInvocationContext, config, 567 new Rescheduler(mCmd.getCommandTracker()), mListeners); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
596 @Override 597 public void invoke( 598 IInvocationContext context, 599 IConfiguration config, 600 IRescheduler rescheduler, 601 ITestInvocationListener... extraListeners) 602 throws DeviceNotAvailableException, Throwable { 603 List<ITestInvocationListener> allListeners = 604 new ArrayList<>(config.getTestInvocationListeners().size() + extraListeners.length); 605 allListeners.addAll(config.getTestInvocationListeners()); 606 allListeners.addAll(Arrays.asList(extraListeners)); 607 ITestInvocationListener listener = 608 new LogSaverResultForwarder(config.getLogSaver(), allListeners); 609 IInvocationExecution invocationPath = 610 createInvocationExec(config.getConfigurationDescription().shouldUseSandbox()); 611 612 // Create the Guice scope 613 InvocationScope scope = getInvocationScope(); 614 scope.enter(); 615 // Seed our TF objects to the Guice scope 616 scope.seedConfiguration(config); 617 try { 618 mStatus = "fetching build"; 619 config.getLogOutput().init(); 620 getLogRegistry().registerLogger(config.getLogOutput()); 621 for (String deviceName : context.getDeviceConfigNames()) { 622 context.getDevice(deviceName).clearLastConnectedWifiNetwork(); 623 context.getDevice(deviceName) 624 .setOptions(config.getDeviceConfigByName(deviceName).getDeviceOptions()); 625 if (config.getDeviceConfigByName(deviceName) 626 .getDeviceOptions() 627 .isLogcatCaptureEnabled()) { 628 if (!(context.getDevice(deviceName).getIDevice() instanceof StubDevice)) { 629 context.getDevice(deviceName).startLogcat(); 630 } 631 } 632 } 633 634 String cmdLineArgs = config.getCommandLine(); 635 if (cmdLineArgs != null) { 636 CLog.i("Invocation was started with cmd: %s", cmdLineArgs); 637 } 638 639 long start = System.currentTimeMillis(); 640 boolean providerSuccess = 641 invokeFetchBuild(context, config, rescheduler, listener, invocationPath); 642 long fetchBuildDuration = System.currentTimeMillis() - start; 643 context.addInvocationTimingMetric(IInvocationContext.TimingEvent.FETCH_BUILD, 644 fetchBuildDuration); 645 CLog.d("Fetch build duration: %s", TimeUtil.formatElapsedTime(fetchBuildDuration)); 646 if (!providerSuccess) { 647 return; 648 } 649 650 mStatus = "sharding"; 651 boolean sharding = invocationPath.shardConfig(config, context, rescheduler); ... } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
196 /** 197 * Attempt to shard given {@link IRemoteTest}. 198 * 199 * @param shardableTests the list of {@link IRemoteTest}s to add to 200 * @param test the {@link IRemoteTest} to shard 201 * @param shardCount attempted number of shard, can be null. 202 * @param context the {@link IInvocationContext} of the current invocation. 203 * @return <code>true</code> if test was sharded 204 */ 205 private static boolean shardTest( 206 List<IRemoteTest> shardableTests, 207 IRemoteTest test, 208 Integer shardCount, 209 IInvocationContext context){ 210 boolean isSharded = false; 211 if (test instanceof IShardableTest) { 212 // inject device and build since they might be required to shard. 213 if (test instanceof IBuildReceiver) { 214 ((IBuildReceiver) test).setBuild(context.getBuildInfos().get(0)); 215 } 216 if (test instanceof IDeviceTest) { 217 ((IDeviceTest) test).setDevice(context.getDevices().get(0)); 218 } 219 if (test instanceof IMultiDeviceTest) { 220 ((IMultiDeviceTest) test).setDeviceInfos(context.getDeviceBuildMap()); 221 } 222 if (test instanceof IInvocationContextReceiver) { 223 ((IInvocationContextReceiver) test).setInvocationContext(context); 224 } 225 226 IShardableTest shardableTest = (IShardableTest) test; 227 Collection<IRemoteTest> shards = null; 228 // Give the shardCount hint to tests if they need it. 229 if (shardCount != null) { 230 shards = shardableTest.split(shardCount); 231 } else { 232 shards = shardableTest.split(); 233 } 234 if (shards != null) { 235 shardableTests.addAll(shards); 236 isSharded = true; 237 } 238 } 239 if (!isSharded) { 240 shardableTests.add(test); 241 } 242 return isSharded; 243 } |
1 2 3 4 5 6 7 8 9 10 11 |
180 @Override 181 public Collection<IRemoteTest> split(int shardCountHint){ 182 try { 183 CompatibilityTestSuite test = loadSuite(); 184 return test.split(shardCountHint); 185 } catch (DeviceNotAvailableException e) { 186 CLog.e("Failed to shard the retry run."); 187 CLog.e(e); 188 } 189 return null; 190 } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
621 /** {@inheritDoc} */ 622 @Override 623 public Collection<IRemoteTest> split(int shardCountHint){ 624 if (shardCountHint <= 1 || mIsSharded) { 625 // cannot shard or already sharded 626 return null; 627 } 628 629 LinkedHashMap<String, IConfiguration> runConfig = loadAndFilter(); 630 if (runConfig.isEmpty()) { 631 CLog.i("No config were loaded. Nothing to run."); 632 return null; 633 } 634 injectInfo(runConfig); 635 636 // We split individual tests on double the shardCountHint to provide better average. 637 // The test pool mechanism prevent this from creating too much overhead. 638 List<ModuleDefinition> splitModules = 639 ModuleSplitter.splitConfiguration( 640 runConfig, shardCountHint, mShouldMakeDynamicModule); 641 runConfig.clear(); 642 runConfig = null; 643 // create an association of one ITestSuite <=> one ModuleDefinition as the smallest 644 // execution unit supported. 645 List<IRemoteTest> splitTests = new ArrayList<>(); 646 for (ModuleDefinition m : splitModules) { 647 ITestSuite suite = createInstance(); 648 OptionCopier.copyOptionsNoThrow(this, suite); 649 suite.mIsSharded = true; 650 suite.mDirectModule = m; 651 splitTests.add(suite); 652 } 653 // return the list of ITestSuite with their ModuleDefinition assigned 654 return splitTests; 655 } |
最终导致持有大量的CompatibilityTestSuite,导致case fail不能6台case一起retry
问题总结
因为是属于google的cts测试框架问题,所以先向google反馈;临时的解决方案,减少retry机器台数,将-Xmx4g改为-Xmx8g
1 |
java $RDBG_FLAG -Xmx8g -XX:+HeapDumpOnOutOfMemoryError -cp ${JAR_PATH} -DCTS_ROOT=${CTS_ROOT} com.android.compatibility.common.tradefed.command.CompatibilityConsole "$@" |
环境配置
在上面的临时修改之后,可以在运行时抓一下hprof文件与gc overhead时产生的文件进行对比,但是可能需要一些环境配置,这里记录下我遇到的一些问题,如要查看的是pid = 8154
sudo jmap -heap 8154
结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
Attaching to process ID 8154, please wait... Debugger attached successfully. Server compiler detected. JVM version is 25.181-b13 using thread-local object allocation. Parallel GC with 8 thread(s) Heap Configuration: MinHeapFreeRatio = 0 MaxHeapFreeRatio = 100 MaxHeapSize = 4294967296 (4096.0MB) NewSize = 87031808 (83.0MB) MaxNewSize = 1431306240 (1365.0MB) OldSize = 175112192 (167.0MB) NewRatio = 2 SurvivorRatio = 8 MetaspaceSize = 21807104 (20.796875MB) CompressedClassSpaceSize = 1073741824 (1024.0MB) MaxMetaspaceSize = 17592186044415 MB G1HeapRegionSize = 0 (0.0MB) Heap Usage: Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.tools.jmap.JMap.runTool(JMap.java:201) at sun.tools.jmap.JMap.main(JMap.java:130) Caused by: java.lang.RuntimeException: unknown CollectedHeap type : class sun.jvm.hotspot.gc_interface.CollectedHeap at sun.jvm.hotspot.tools.HeapSummary.run(HeapSummary.java:144) at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260) at sun.jvm.hotspot.tools.Tool.start(Tool.java:223) at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118) at sun.jvm.hotspot.tools.HeapSummary.main(HeapSummary.java:49) ... 6 more |
因为这里用到了sun.jvm下的debug包,默认openJDK的安装中不包含,要么装个oracle的jdk,要么下载一个openjdk的debuginfo,这里安装正式jdk
1.官网下载压缩文件
- 手动解压配置
sudo update-alternatives --install /usr/bin/java java /home/weijuncheng/java8/jdk1.8.0_191/bin/java 700
sudo update-alternatives --install /usr/bin/javac javac /home/weijuncheng/java8/jdk1.8.0_191/bin/javac 700
sudo update-alternatives --install /usr/bin/jar jar /home/weijuncheng/java8/jdk1.8.0_191/bin/jar 700
- sudo update-alternatives --config java 切换到jdk 1.8
然后又报错了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
Error attaching to process: sun.jvm.hotspot.runtime.VMVersionMismatchException: Supported versions are 25.181-b13. Target VM is 25.191-b12 sun.jvm.hotspot.debugger.DebuggerException: sun.jvm.hotspot.runtime.VMVersionMismatchException: Supported versions are 25.181-b13. Target VM is 25.191-b12 at sun.jvm.hotspot.HotSpotAgent.setupVM(HotSpotAgent.java:435) at sun.jvm.hotspot.HotSpotAgent.go(HotSpotAgent.java:305) at sun.jvm.hotspot.HotSpotAgent.attach(HotSpotAgent.java:140) at sun.jvm.hotspot.tools.Tool.start(Tool.java:185) at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118) at sun.jvm.hotspot.tools.HeapSummary.main(HeapSummary.java:49) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.tools.jmap.JMap.runTool(JMap.java:201) at sun.tools.jmap.JMap.main(JMap.java:130) Caused by: sun.jvm.hotspot.runtime.VMVersionMismatchException: Supported versions are 25.181-b13. Target VM is 25.191-b12 at sun.jvm.hotspot.runtime.VM.checkVMVersion(VM.java:227) at sun.jvm.hotspot.runtime.VM.<init>(VM.java:294) at sun.jvm.hotspot.runtime.VM.initialize(VM.java:370) at sun.jvm.hotspot.HotSpotAgent.setupVM(HotSpotAgent.java:431) |
原因:
cd /etc/alternatives
lrwxrwxrwx 1 root root 45 11月 1 14:41 java -> /home/weijuncheng/java8/jdk1.8.0_191/bin/java
lrwxrwxrwx 1 root root 42 1月 22 2018 jmap -> /usr/lib/jvm/java-8-openjdk-amd64/bin/jmap
两者不一样,所以会报版本不同的错误
sudo update-alternatives --config jmap
选择jdk 1.8
再执行,则没有报错了
最后jmap -dump:live,format=b,file=heap.hprof pid即可为相应pid生成hprof文件