问题分析
这个问题不是第一次出现,详见CTS问题分析10;但当时有更紧急的问题,所以并没有继续深入分析,只是分析到持有大量的CompatibilityTestSuite导致retry时发生错误;
但是这次又出现了,因此有必要进行下调研,以确保下次不再复现此问题
retry 命令: run retry --retry 0 --shard-count 2 -s 7c6252f -s 7c62472
终端报错log:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
java.lang.OutOfMemoryError: GC overhead limit exceeded Dumping heap to java_pid26338.hprof ... Heap dump file created [5553157593 bytes in 101.829 secs] 01-29 16:09:47 E/CommandScheduler: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.newNode(HashMap.java:1747) at java.util.HashMap.putVal(HashMap.java:631) at java.util.HashMap.put(HashMap.java:612) at java.util.HashSet.add(HashSet.java:220) at java.util.AbstractCollection.addAll(AbstractCollection.java:344) at com.android.tradefed.config.OptionSetter.setFieldValue(OptionSetter.java:452) at com.android.tradefed.config.OptionSetter.setFieldValue(OptionSetter.java:549) at com.android.tradefed.config.OptionCopier.copyOptions(OptionCopier.java:49) at com.android.tradefed.config.OptionCopier.copyOptionsNoThrow(OptionCopier.java:60) at com.android.tradefed.testtype.suite.ITestSuite.split(ITestSuite.java:662) at com.android.compatibility.common.tradefed.testtype.retry.RetryFactoryTest.split(RetryFactoryTest.java:122) at com.android.tradefed.invoker.shard.ShardHelper.shardTest(ShardHelper.java:123) at com.android.tradefed.invoker.shard.ShardHelper.shardConfig(ShardHelper.java:30) at com.android.tradefed.invoker.shard.StrictShardHelper.shardConfig(StrictShardHelper.java:51) at com.android.tradefed.invoker.InvocationExecution.shardConfig(InvocationExecution.java:149) at com.android.tradefed.invoker.TestInvocation.invoke(TestInvocation.java:656) at com.android.tradefed.command.CommandScheduler$InvocationThread.run(CommandScheduler.java:1357) |
首先,我们从中可以看到失败时栈的路径,从中找出为什么占用大量内存的原因
多台机器retry时的数据结构组织
通过以前的分析,我们知道大量的CompatibilityTestSuite,中间持有大量的exclude case项记录最终造成问题;因此我们跟着栈梳理下多台机器retry时,cts相关的数据结构是如何组织的
tools/tradefederation/core/src/com/android/tradefed/invoker/shard/ShardHelper.java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
65 /** 66 * Attempt to shard the configuration into sub-configurations, to be re-scheduled to run on 67 * multiple resources in parallel. 68 * 69 * <p>A successful shard action renders the current config empty, and invocation should not 70 * proceed. 71 * 72 * @see IShardableTest 73 * @see IRescheduler 74 * @param config the current {@link IConfiguration}. 75 * @param context the {@link IInvocationContext} holding the tests information. 76 * @param rescheduler the {@link IRescheduler} 77 * @return true if test was sharded. Otherwise return <code>false</code> 78 */ 79 @Override 80 public boolean shardConfig( 81 IConfiguration config, IInvocationContext context, IRescheduler rescheduler) { 82 List<IRemoteTest> shardableTests = new ArrayList<IRemoteTest>(); 83 boolean isSharded = false; 84 Integer shardCount = config.getCommandOptions().getShardCount(); 85 for (IRemoteTest test : config.getTests()) { 86 isSharded |= shardTest(shardableTests, test, shardCount, context);// shardTest做retry时test的切分工作 ,此时test中没有什么,只记录了cts-known-failures.xml中的已知失败项,保存在exclude list中 87 } 88 if (!isSharded) { 89 return false; 90 } 91 // shard this invocation! 92 // create the TestInvocationListener that will collect results from all the shards, 93 // and forward them to the original set of listeners (minus any ISharddableListeners) 94 // once all shards complete 95 int expectedShard = shardableTests.size(); 96 if (shardCount != null) { 97 expectedShard = Math.min(shardCount, shardableTests.size()); 98 } 99 ShardMasterResultForwarder resultCollector = 100 new ShardMasterResultForwarder(buildMasterShardListeners(config), expectedShard); 101 102 resultCollector.invocationStarted(context); 103 synchronized (shardableTests) { 104 // When shardCount is available only create 1 poller per shard 105 // TODO: consider aggregating both case by picking a predefined shardCount if not 106 // available (like 4) for autosharding. 107 if (shardCount != null) { 108 // We shuffle the tests for best results: avoid having the same module sub-tests 109 // contiguously in the list. 110 Collections.shuffle(shardableTests); 111 int maxShard = Math.min(shardCount, shardableTests.size()); 112 CountDownLatch tracker = new CountDownLatch(maxShard); 113 for (int i = 0; i < maxShard; i++) { 114 IConfiguration shardConfig = config.clone(); 115 shardConfig.setTest(new TestsPoolPoller(shardableTests, tracker)); 116 rescheduleConfig(shardConfig, config, context, rescheduler, resultCollector); 117 } 118 } else { 119 CountDownLatch tracker = new CountDownLatch(shardableTests.size()); 120 for (IRemoteTest testShard : shardableTests) { 121 CLog.i("Rescheduling sharded config..."); 122 IConfiguration shardConfig = config.clone(); 123 if (config.getCommandOptions().shouldUseDynamicSharding()) { 124 shardConfig.setTest(new TestsPoolPoller(shardableTests, tracker)); 125 } else { 126 shardConfig.setTest(testShard); 127 } 128 rescheduleConfig(shardConfig, config, context, rescheduler, resultCollector); 129 } 130 } 131 } 132 // clean up original builds 133 for (String deviceName : context.getDeviceConfigNames()) { 134 config.getDeviceConfigByName(deviceName) 135 .getBuildProvider() 136 .cleanUp(context.getBuildInfo(deviceName)); 137 } 138 return true; 139 } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
196 /** 197 * Attempt to shard given {@link IRemoteTest}. 198 * 199 * @param shardableTests the list of {@link IRemoteTest}s to add to 200 * @param test the {@link IRemoteTest} to shard 201 * @param shardCount attempted number of shard, can be null. 202 * @param context the {@link IInvocationContext} of the current invocation. 203 * @return <code>true</code> if test was sharded 204 */ 205 private static boolean shardTest( 206 List<IRemoteTest> shardableTests, 207 IRemoteTest test, 208 Integer shardCount, 209 IInvocationContext context) { 210 boolean isSharded = false; 211 if (test instanceof IShardableTest) { 212 // inject device and build since they might be required to shard. 213 if (test instanceof IBuildReceiver) { 214 ((IBuildReceiver) test).setBuild(context.getBuildInfos().get(0)); 215 } 216 if (test instanceof IDeviceTest) { 217 ((IDeviceTest) test).setDevice(context.getDevices().get(0)); 218 } 219 if (test instanceof IMultiDeviceTest) { 220 ((IMultiDeviceTest) test).setDeviceInfos(context.getDeviceBuildMap()); 221 } 222 if (test instanceof IInvocationContextReceiver) { 223 ((IInvocationContextReceiver) test).setInvocationContext(context); 224 } 225 //为test设置一些属性 226 IShardableTest shardableTest = (IShardableTest) test; 227 Collection<IRemoteTest> shards = null; 228 // Give the shardCount hint to tests if they need it. 229 if (shardCount != null) { //当多台机器retry指定了shardCount时 230 shards = shardableTest.split(shardCount); //调用RetryFactoryTest.split方法 231 } else { 232 shards = shardableTest.split(); 233 } 234 if (shards != null) { 235 shardableTests.addAll(shards); 236 isSharded = true; 237 } 238 } 239 if (!isSharded) { 240 shardableTests.add(test); 241 } 242 return isSharded; 243 } |
test/suite_harness/common/host-side/tradefed/src/com/android/compatibility/common/tradefed/testtype/retry/RetryFactoryTest.java
1 2 3 4 5 6 7 8 9 10 11 |
180 @Override 181 public Collection<IRemoteTest> split(int shardCountHint) { 182 try { 183 CompatibilityTestSuite test = loadSuite(); 184 return test.split(shardCountHint); //注意上面两句,这里是组织数据结构的关键所在 185 } catch (DeviceNotAvailableException e) { 186 CLog.e("Failed to shard the retry run."); 187 CLog.e(e); 188 } 189 return null; 190 } |
创建一个CompatibilityTestSuite
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
192 /** 193 * Helper to create a {@link CompatibilityTestSuite} from previous results. 194 */ 195 private CompatibilityTestSuite loadSuite() throws DeviceNotAvailableException { 196 // Create a compatibility test and set it to run only what we want. 197 CompatibilityTestSuite test = createTest(); 198 199 CompatibilityBuildHelper buildHelper = new CompatibilityBuildHelper(mBuildInfo); 200 // Create the helper with all the options needed. 201 RetryFilterHelper helper = createFilterHelper(buildHelper); //创建一个RetryFilterHelper 202 // TODO: we have access to the original command line, we should accommodate more re-run 203 // scenario like when the original cts.xml config was not used. 204 helper.validateBuildFingerprint(mDevice); 205 helper.setCommandLineOptionsFor(test); 206 helper.setCommandLineOptionsFor(this); 207 helper.populateRetryFilters(); //exclude项的增加 208 209 try { 210 OptionSetter setter = new OptionSetter(test); 211 for (String moduleArg : mModuleArgs) { 212 setter.setOptionValue("compatibility:module-arg", moduleArg); 213 } 214 for (String testArg : mTestArgs) { 215 setter.setOptionValue("compatibility:test-arg", testArg); 216 } 217 } catch (ConfigurationException e) { 218 throw new RuntimeException(e); 219 } 220 221 test.setIncludeFilter(helper.getIncludeFilters()); 222 test.setExcludeFilter(helper.getExcludeFilters()); 223 test.setDevice(mDevice); 224 test.setBuild(mBuildInfo); 225 test.setAbiName(mAbiName); 226 test.setPrimaryAbiRun(mPrimaryAbiRun); 227 test.setSystemStatusChecker(mStatusCheckers); 228 test.setInvocationContext(mContext); 229 test.setConfiguration(mMainConfiguration); 230 // reset the retry id - Ensure that retry of retry does not throw 231 test.resetRetryId(); 232 test.isRetry(); 233 // clean the helper 234 helper.tearDown(); 235 return test; 236 } |
test/suite_harness/common/host-side/tradefed/src/com/android/compatibility/common/tradefed/util/RetryFilterHelper.java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
72 /** 73 * Constructor for a {@link RetryFilterHelper}. 74 * 75 * @param build a {@link CompatibilityBuildHelper} describing the build. 76 * @param sessionId The ID of the session to retry. 77 * @param subPlan The name of a subPlan to be used. Can be null. 78 * @param includeFilters The include module filters to apply 79 * @param excludeFilters The exclude module filters to apply 80 * @param abiName The name of abi to use. Can be null. 81 * @param moduleName The name of the module to run. Can be null. 82 * @param testName The name of the test to run. Can be null. 83 * @param retryType The type of results to retry. Can be null. 84 */ 85 public RetryFilterHelper(CompatibilityBuildHelper build, int sessionId, String subPlan, 86 Set<String> includeFilters, Set<String> excludeFilters, String abiName, 87 String moduleName, String testName, RetryType retryType) { 88 this(build, sessionId); 89 mSubPlan = subPlan; 90 mIncludeFilters.addAll(includeFilters); 91 mExcludeFilters.addAll(excludeFilters); 92 mAbiName = abiName; 93 mModuleName = moduleName; 94 mTestName = testName; 95 mRetryType = retryType; 96 } |
到此时mExcludeFilters中还只有cts-known-failures.xml中记录的已知错误,关键在populateRetryFilters
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
183 /** 184 * Populate mRetryIncludes and mRetryExcludes based on the options and the result set for 185 * this instance of RetryFilterHelper. 186 */ 187 public void populateRetryFilters() { 188 mRetryIncludes = new HashSet<>(mIncludeFilters); // reset for each population 189 mRetryExcludes = new HashSet<>(mExcludeFilters); // reset for each population 190 if (RetryType.CUSTOM.equals(mRetryType)) { 191 Set<String> customIncludes = new HashSet<>(mIncludeFilters); 192 Set<String> customExcludes = new HashSet<>(mExcludeFilters); 193 if (mSubPlan != null) { //retry时一般不指定subplan,因此这里不会走到 194 ISubPlan retrySubPlan = SubPlanHelper.getSubPlanByName(mBuild, mSubPlan); 195 customIncludes.addAll(retrySubPlan.getIncludeFilters()); 196 customExcludes.addAll(retrySubPlan.getExcludeFilters()); 197 } 198 // If includes were added, only use those includes. Also use excludes added directly 199 // or by subplan. Otherwise, default to normal retry. 200 if (!customIncludes.isEmpty()) { 201 mRetryIncludes.clear(); 202 mRetryIncludes.addAll(customIncludes); 203 mRetryExcludes.addAll(customExcludes); 204 return; 205 } 206 } 207 // remove any extra filtering options 208 // TODO(aaronholden) remove non-plan includes (e.g. those in cts-vendor-interface) 209 // TODO(aaronholden) remove non-known-failure excludes 210 mModuleName = null; 211 mTestName = null; 212 mSubPlan = null; 213 populateFiltersBySubPlan(); 214 populatePreviousSessionFilters(); 215 } |
因此会走到这里
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
217 /* Generation of filters based on previous sessions is implemented thoroughly in SubPlanHelper, 218 * and retry filter generation is just a subset of the use cases for the subplan retry logic. 219 * Use retry type to determine which result types SubPlanHelper targets. */ 220 public void populateFiltersBySubPlan() { 221 SubPlanHelper retryPlanCreator = new SubPlanHelper(); 222 retryPlanCreator.setResult(getResult()); 223 if (RetryType.FAILED.equals(mRetryType)) { 224 // retry only failed tests 225 retryPlanCreator.addResultType(SubPlanHelper.FAILED); 226 } else if (RetryType.NOT_EXECUTED.equals(mRetryType)){ 227 // retry only not executed tests 228 retryPlanCreator.addResultType(SubPlanHelper.NOT_EXECUTED); 229 } else { 230 // retry both failed and not executed tests 231 retryPlanCreator.addResultType(SubPlanHelper.FAILED); 232 retryPlanCreator.addResultType(SubPlanHelper.NOT_EXECUTED); 233 } 234 try { 235 ISubPlan retryPlan = retryPlanCreator.createSubPlan(mBuild); //可以看到SubPlanHelper中的include list和exclude list会被加到CompatibilityTestSuite项中 236 mRetryIncludes.addAll(retryPlan.getIncludeFilters());了 237 mRetryExcludes.addAll(retryPlan.getExcludeFilters()); 238 } catch (ConfigurationException e) { 239 throw new RuntimeException ("Failed to create subplan for retry", e); 240 } 241 } |
test/suite_harness/common/host-side/tradefed/src/com/android/compatibility/common/tradefed/result/SubPlanHelper.java
createSubPlan 最关键点,从我们retry的报告中提取信息到include list(mIncludeFilters)和exclude list(mExcludeFilters)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
206 /** 207 * Create a subplan derived from a result. 208 * <p/> 209 * {@link Option} values must be set before this is called. 210 * @param buildHelper 211 * @return subplan 212 * @throws ConfigurationException 213 */ 214 public ISubPlan createSubPlan(CompatibilityBuildHelper buildHelper) 215 throws ConfigurationException { 216 setupFields(buildHelper); 217 ISubPlan subPlan = new SubPlan(); 218 219 // add filters from previous session to track which tests must run 220 subPlan.addAllIncludeFilters(mIncludeFilters); 221 subPlan.addAllExcludeFilters(mExcludeFilters); 222 if (mLastSubPlan != null) { 223 ISubPlan lastSubPlan = SubPlanHelper.getSubPlanByName(buildHelper, mLastSubPlan); 224 subPlan.addAllIncludeFilters(lastSubPlan.getIncludeFilters()); 225 subPlan.addAllExcludeFilters(lastSubPlan.getExcludeFilters()); 226 } 227 if (mModuleName != null) { 228 addIncludeToSubPlan(subPlan, new TestFilter(mAbiName, mModuleName, mTestName)); 229 } 230 Set<TestStatus> statusesToRun = getStatusesToRun(); 231 for (IModuleResult module : mResult.getModules()) { 232 if (shouldRunModule(module)) { 233 TestFilter moduleInclude = 234 new TestFilter(module.getAbi(), module.getName(), null /*test*/); 235 if (shouldRunEntireModule(module)) { 236 // include entire module 237 addIncludeToSubPlan(subPlan, moduleInclude); //整个模块的所有case全部fail 238 } else if (mResultTypes.contains(NOT_EXECUTED) && !module.isDone()) { 239 // add module include and test excludes 240 addIncludeToSubPlan(subPlan, moduleInclude); 241 for (ICaseResult caseResult : module.getResults()) { 242 for (ITestResult testResult : caseResult.getResults()) { 243 if (!statusesToRun.contains(testResult.getResultStatus())) { 244 TestFilter testExclude = new TestFilter(module.getAbi(), 245 module.getName(), testResult.getFullName()); 246 addExcludeToSubPlan(subPlan, testExclude); //模块没执行完 done = false的情况 247 } 248 } 249 } 250 } else { 251 // Not-executed tests should not be rerun and/or this module is completed 252 // In any such case, it suffices to add includes for each test to rerun 253 for (ICaseResult caseResult : module.getResults()) { 254 for (ITestResult testResult : caseResult.getResults()) { 255 if (statusesToRun.contains(testResult.getResultStatus())) { 256 TestFilter testInclude = new TestFilter(module.getAbi(), 257 module.getName(), testResult.getFullName()); 258 addIncludeToSubPlan(subPlan, testInclude);//模块执行完成,但是中间有部分fail的情况 259 } 260 } 261 } 262 } 263 } else { 264 // module should not run, exclude entire module 265 TestFilter moduleExclude = 266 new TestFilter(module.getAbi(), module.getName(), null /*test*/); 267 addExcludeToSubPlan(subPlan, moduleExclude);//全部正确的module 268 } 269 } 270 return subPlan; 271 } |
那么到这里,CompatibilityTestSuite为什么会持有大量的exclude case项记录已经明白了,CtsDeqpTestCases没有完成,且是在快完成前中断导致最后没有完成,这一项共有35万条case(仅v7a或者v8a)
CompatibilityTestSuite下面的一些初始化操作因为不是本文的重点,不再赘述了;继续看test.split(shardCountHint)的逻辑
tools/tradefederation/core/src/com/android/tradefed/testtype/suite/ITestSuite.java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
621 /** {@inheritDoc} */ 622 @Override 623 public Collection<IRemoteTest> split(int shardCountHint) { 624 if (shardCountHint <= 1 || mIsSharded) { 625 // cannot shard or already sharded 626 return null; 627 } 628 629 LinkedHashMap<String, IConfiguration> runConfig = loadAndFilter(); 630 if (runConfig.isEmpty()) { 631 CLog.i("No config were loaded. Nothing to run."); 632 return null; 633 } 634 injectInfo(runConfig); 635 636 // We split individual tests on double the shardCountHint to provide better average. 637 // The test pool mechanism prevent this from creating too much overhead. 638 List<ModuleDefinition> splitModules = 639 ModuleSplitter.splitConfiguration( 640 runConfig, shardCountHint, mShouldMakeDynamicModule); 641 runConfig.clear(); 642 runConfig = null; 643 // create an association of one ITestSuite <=> one ModuleDefinition as the smallest 644 // execution unit supported. 645 List<IRemoteTest> splitTests = new ArrayList<>(); 646 for (ModuleDefinition m : splitModules) { 647 ITestSuite suite = createInstance(); 648 OptionCopier.copyOptionsNoThrow(this, suite); 649 suite.mIsSharded = true; 650 suite.mDirectModule = m; 651 splitTests.add(suite); 652 } 653 // return the list of ITestSuite with their ModuleDefinition assigned 654 return splitTests; 655 } |
首先看loadAndFilter的相关逻辑
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
261 private LinkedHashMap<String, IConfiguration> loadAndFilter() { 262 LinkedHashMap<String, IConfiguration> runConfig = loadTests(); 263 if (runConfig.isEmpty()) { 264 CLog.i("No config were loaded. Nothing to run."); 265 return runConfig; 266 } 267 if (mModuleMetadataIncludeFilter.isEmpty() && mModuleMetadataExcludeFilter.isEmpty()) { 268 return runConfig; 269 } 270 LinkedHashMap<String, IConfiguration> filteredConfig = new LinkedHashMap<>(); 271 for (Entry<String, IConfiguration> config : runConfig.entrySet()) { 272 if (!filterByConfigMetadata( 273 config.getValue(), 274 mModuleMetadataIncludeFilter, 275 mModuleMetadataExcludeFilter)) { 276 // if the module config did not pass the metadata filters, it's excluded 277 // from execution. 278 continue; 279 } 280 if (!filterByRunnerType(config.getValue(), mAllowedRunners)) { 281 // if the module config did not pass the runner type filter, it's excluded from 282 // execution. 283 continue; 284 } 285 filterPreparers(config.getValue(), mAllowedPreparers); 286 filteredConfig.put(config.getKey(), config.getValue()); 287 } 288 runConfig.clear(); 289 return filteredConfig; 290 } |
tools/tradefederation/core/src/com/android/tradefed/testtype/suite/BaseTestSuite.java
首先在loadTests中重新组织mIncludeFilters和mExcludeFilters,变为mIncludeFiltersParsed和mExcludeFiltersParsed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
133 /** {@inheritDoc} */ 134 @Override 135 public LinkedHashMap<String, IConfiguration> loadTests() { 136 try { 137 File testsDir = getTestsDir(); 138 setupFilters(testsDir); 139 Set<IAbi> abis = getAbis(getDevice()); 140 141 // Create and populate the filters here 142 SuiteModuleLoader.addFilters(mIncludeFilters, mIncludeFiltersParsed, abis); 143 SuiteModuleLoader.addFilters(mExcludeFilters, mExcludeFiltersParsed, abis); //解析成<String,List>键值对,module为name,List为其test 144 145 CLog.d( 146 "Initializing ModuleRepo\nABIs:%s\n" 147 + "Test Args:%s\nModule Args:%s\nIncludes:%s\nExcludes:%s", 148 abis, mTestArgs, mModuleArgs, mIncludeFiltersParsed, mExcludeFiltersParsed); 149 mModuleRepo = 150 createModuleLoader( 151 mIncludeFiltersParsed, mExcludeFiltersParsed, mTestArgs, mModuleArgs); 152 // Actual loading of the configurations. 153 return loadingStrategy(abis, testsDir, mSuitePrefix, mSuiteTag); //取要执行的module对应的config 154 } catch (DeviceNotAvailableException | FileNotFoundException e) { 155 throw new RuntimeException(e); 156 } 157 } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
159 /** 160 * Default loading strategy will load from the resources and the tests directory. Can be 161 * extended or replaced. 162 * 163 * @param abis The set of abis to run against. 164 * @param testsDir The tests directory. 165 * @param suitePrefix A prefix to filter the resource directory. 166 * @param suiteTag The suite tag a module should have to be included. Can be null. 167 * @return A list of loaded configuration for the suite. 168 */ 169 public LinkedHashMap<String, IConfiguration> loadingStrategy( 170 Set<IAbi> abis, File testsDir, String suitePrefix, String suiteTag) { 171 LinkedHashMap<String, IConfiguration> loadedConfigs = new LinkedHashMap<>(); 172 // Load configs that are part of the resources 173 if (!mSkipJarLoading) { 174 loadedConfigs.putAll( 175 getModuleLoader().loadConfigsFromJars(abis, suitePrefix, suiteTag)); 176 } 177 178 // Load the configs that are part of the tests dir 179 if (mConfigPatterns.isEmpty()) { 180 // If no special pattern was configured, use the default configuration patterns we know 181 mConfigPatterns.add(".*\\.config"); 182 mConfigPatterns.add(".*\\.xml"); 183 } 184 loadedConfigs.putAll( 185 getModuleLoader() 186 .loadConfigsFromDirectory( 187 testsDir, abis, suitePrefix, suiteTag, mConfigPatterns)); 188 return loadedConfigs; 189 } |
tools/tradefederation/core/src/com/android/tradefed/testtype/suite/ModuleSplitter.java
然后调用到splitConfiguration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
56 /** 57 * Create a List of executable unit {@link ModuleDefinition}s based on the map of configuration 58 * that was loaded. 59 * 60 * @param runConfig {@link LinkedHashMap} loaded from {@link ITestSuite#loadTests()}. 61 * @param shardCount a shard count hint to help with sharding. 62 * @return List of {@link ModuleDefinition} 63 */ 64 public static List<ModuleDefinition> splitConfiguration( 65 LinkedHashMap<String, IConfiguration> runConfig, 66 int shardCount, 67 boolean dynamicModule) { 68 if (dynamicModule) { 69 // We maximize the sharding for dynamic to reduce time difference between first and 70 // last shard as much as possible. Overhead is low due to our test pooling. 71 shardCount *= 2; 72 } 73 List<ModuleDefinition> runModules = new ArrayList<>(); 74 for (Entry<String, IConfiguration> configMap : runConfig.entrySet()) { 75 // Check that it's a valid configuration for suites, throw otherwise. 76 ValidateSuiteConfigHelper.validateConfig(configMap.getValue()); 77 78 createAndAddModule( 79 runModules, 80 configMap.getKey(), 81 configMap.getValue(), 82 shardCount, 83 dynamicModule); //根据module name,config,shardcount 创建对应的ModuleDefinition 84 } 85 return runModules; 86 } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
88 private static void createAndAddModule( 89 List<ModuleDefinition> currentList, 90 String moduleName, 91 IConfiguration config, 92 int shardCount, 93 boolean dynamicModule) { 94 // If this particular configuration module is declared as 'not shardable' we take it whole 95 // but still split the individual IRemoteTest in a pool. 96 if (config.getConfigurationDescription().isNotShardable() 97 || (!dynamicModule 98 && config.getConfigurationDescription().isNotStrictShardable())) { 99 for (int i = 0; i < config.getTests().size(); i++) { 100 if (dynamicModule) { 101 ModuleDefinition module = 102 new ModuleDefinition( 103 moduleName, 104 config.getTests(), 105 clonePreparersMap(config), 106 clonePreparers(config.getMultiTargetPreparers()), 107 config); 108 currentList.add(module); 109 } else { 110 addModuleToListFromSingleTest( 111 currentList, config.getTests().get(i), moduleName, config); 112 } 113 } 114 return; 115 } 116 117 // If configuration is possibly shardable we attempt to shard it. 118 for (IRemoteTest test : config.getTests()) { 119 if (test instanceof IShardableTest) { 120 Collection<IRemoteTest> shardedTests = ((IShardableTest) test).split(shardCount); 121 if (shardedTests != null) { 122 // Test did shard we put the shard pool in ModuleDefinition which has a polling 123 // behavior on the pool. 124 if (dynamicModule) { 125 for (int i = 0; i < shardCount; i++) { 126 ModuleDefinition module = 127 new ModuleDefinition( 128 moduleName, 129 shardedTests, 130 clonePreparersMap(config), 131 clonePreparers(config.getMultiTargetPreparers()), 132 config); 133 currentList.add(module); 134 } 135 } else { 136 // We create independent modules with each sharded test. 137 for (IRemoteTest moduleTest : shardedTests) { 138 addModuleToListFromSingleTest( 139 currentList, moduleTest, moduleName, config); 140 } 141 } 142 continue; 143 } 144 } 145 // test is not shardable or did not shard 146 addModuleToListFromSingleTest(currentList, test, moduleName, config); 147 } 148 } |
创建出ModuleDefinition list之后,根据其进行进一步的split操作
1 2 3 4 5 6 7 |
646 for (ModuleDefinition m : splitModules) { 647 ITestSuite suite = createInstance(); 648 OptionCopier.copyOptionsNoThrow(this, suite); //注意这里,刚刚的创建的CompatibilityTestSuite有复制的操作 649 suite.mIsSharded = true; 650 suite.mDirectModule = m; //新的suite,为mDirectModule赋值(刚刚创建的ModuleDefinition) 651 splitTests.add(suite); //CompatibilityTestSuite list 652 } |
这里splitTests就是hprof中造成失败的CompatibilityTestSuite list
tools/tradefederation/core/src/com/android/tradefed/config/OptionCopier.java
1 2 3 4 5 6 7 8 9 10 11 |
54 /** 55 * Identical to {@link #copyOptions(Object, Object)} but will log instead of throw if exception 56 * occurs. 57 */ 58 public static void copyOptionsNoThrow(Object source, Object dest) { 59 try { 60 copyOptions(source, dest); 61 } catch (ConfigurationException e) { 62 CLog.e(e); 63 } 64 } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
32 /** 33 * Copy the values from {@link Option} fields in <var>origObject</var> to <var>destObject</var> 34 * 35 * @param origObject the {@link Object} to copy from 36 * @param destObject the {@link Object} tp copy to 37 * @throws ConfigurationException if options failed to copy 38 */ 39 public static void copyOptions(Object origObject, Object destObject) 40 throws ConfigurationException { 41 Collection<Field> origFields = OptionSetter.getOptionFieldsForClass(origObject.getClass()); 42 Map<String, Field> destFieldMap = getFieldOptionMap(destObject); 43 for (Field origField : origFields) { 44 final Option option = origField.getAnnotation(Option.class); 45 Field destField = destFieldMap.remove(option.name()); 46 if (destField != null) { 47 Object origValue = OptionSetter.getFieldValue(origField, 48 origObject); 49 OptionSetter.setFieldValue(option.name(), destObject, destField, origValue); 50 } 51 } 52 } |
最后复制出大量的CompatibilityTestSuite (需要retry module多的情况) ;并且每个CompatibilityTestSuite持有大量的exclude记录项(35万条);最终造成log中的报错
问题总结
- 测试CtsDeqpTestCases module这个超大模块时,再其要执行完时,adb中断等情况造成case中断,done = false;因此再retry时,会将大量的exclude项记录到CompatibilityTestSuite中
- CompatibilityTestSuite在多台机器retry时有复制操作,更进一步放到了问题,导致fail
- 临时解决方案,将CtsDeqpTestCases这个模块单独提出来测试,这样能保证问题绝对不会发生;就算在此中断,单独retry CtsDeqpTestCases报告也不会进行复制操作;因此,目前看来只要单独测试CtsDeqpTestCases模块,此问题绝不会复现,这也是google允许的
- 建议google进行cts框架的修改,比如对retry时不用的exclude项进行移除;或者复制CompatibilityTestSuite时对exclude list用单例模式进行处理(这建议google来修复,google更熟悉此逻辑,并且google自身有专门的团队在不断迭代更新)
- 向google提供的首个patch 只是一种思路,不太好,还是建议google来修复这个问题