说明此服务是从25.0 开始包含的,同时在release note 中也有说明,以下主要说明下内部实现
release 信息
如下,具体就不翻译了,主要是添加了一个每个任务进行每个view最大保留50个历史信息
Added daily catalog maintenance tasks to trim history of views to a maximum of 50 records per view. This limits the storage needed for datasetVersions records in the KV store.
内部处理
是由CatalogMaintenanceService 服务启动的 CatalogMaintenanceRunnableProvider 任务
- 服务注册
可以看到此服务是在协调节点执行的,也比较符合dremio的套路
private void registerCatalogMaintenanceService(
SingletonRegistry registry, boolean isCoordinator) {
if (isCoordinator) {
registry.bindSelf(
new CatalogMaintenanceService(
registry.provider(SchedulerService.class),
registry.provider(java.util.concurrent.ExecutorService.class),
new CatalogMaintenanceRunnableProvider(
registry.provider(OptionManager.class),
registry.provider(KVStoreProvider.class).get())
.get(0)));
}
}
- CatalogMaintenanceRunnableProvider 内部处理
CatalogMaintenanceRunnable.builder()
.setName("TrimVersions")
.setSchedule(makeDailySchedule(trimVersionsTime))
.setRunnable(
() ->
DatasetVersionTrimmer.trimHistory(
Clock.systemUTC(),
storeProvider.getStore(DatasetVersionMutator.VersionStoreCreator.class),
// 此值是50
(int) optionManager.getOption(NamespaceOptions.DATASET_VERSIONS_LIMIT),
minAgeInDays))
.build());
- DatasetVersionTrimmer.trimHistory 处理
实际处理,代码注释应该都说明了,可以结合分析下
private void trimHistory(int maxVersionsToKeep, int minAgeInDays) {
Preconditions.checkArgument(maxVersionsToKeep > 0, "maxVersionsToKeep must be positive");
Preconditions.checkArgument(minAgeInDays > 0, "minAgeInDays must be positive");
// Assume number of datasets is somewhat small compared to number of versions.
// First pass: count versions per dataset.
Map<DatasetPath, Integer> counts = Maps.newHashMap();
for (Document<DatasetVersionMutator.VersionDatasetKey, VirtualDatasetVersion> entry :
datasetVersionsStore.find()) {
counts.compute(entry.getKey().getPath(), (key, count) -> count != null ? count + 1 : 1);
}
// Collect and order paths with more than requested number of versions.
ImmutableList<Map.Entry<DatasetPath, Integer>> pathsWithCounts =
counts.entrySet().stream()
.sorted(Comparator.comparing(e -> e.getKey().toPathString()))
.collect(ImmutableList.toImmutableList());
ImmutableSet<DatasetPath> pathsSet =
pathsWithCounts.stream()
.filter(e -> e.getValue() > maxVersionsToKeep)
.map(Map.Entry::getKey)
.collect(ImmutableSet.toImmutableSet());
if (!pathsSet.isEmpty()) {
// Second pass: get versions to delete (past the maxVersionsToKeep) and update (set previous
// version to null in the last element of the kept history).
ArrayList<DatasetVersionMutator.VersionDatasetKey> keysToDelete = new ArrayList<>();
Map<DatasetVersionMutator.VersionDatasetKey, VirtualDatasetVersion> versionsToUpdate =
Maps.newHashMap();
DatasetPath startPath = pathsWithCounts.get(0).getKey();
int versionsInRange = 0;
for (int index = 0; index < pathsWithCounts.size(); index++) {
Map.Entry<DatasetPath, Integer> pathAndCount = pathsWithCounts.get(index);
DatasetPath endPath = pathAndCount.getKey();
versionsInRange += pathAndCount.getValue();
if (versionsInRange < MAX_VERSIONS_IN_RANGE && index + 1 < pathsWithCounts.size()) {
continue;
}
// Collect versions to trim/update in the range.
logger.info("Collecting records to trim, batch: s: {} e: {}", startPath, endPath);
keysToDelete.clear();
versionsToUpdate.clear();
findVersionKeysToTrim(
startPath,
endPath,
pathsSet,
maxVersionsToKeep,
minAgeInDays,
keysToDelete,
versionsToUpdate);
// Update versions first, for any partial updates due to errors/conflicts etc, next run will
// fix it.
logger.info("Updating batch of {} older dataset versions", versionsToUpdate.size());
for (Map.Entry<DatasetVersionMutator.VersionDatasetKey, VirtualDatasetVersion> entry :
versionsToUpdate.entrySet()) {
datasetVersionsStore.put(entry.getKey(), entry.getValue());
}
for (List<DatasetVersionMutator.VersionDatasetKey> keysRange :
Lists.partition(keysToDelete, MAX_VERSIONS_TO_DELETE)) {
logger.info("Deleting batch of {} older dataset versions", keysRange.size());
datasetVersionsStore.bulkDelete(keysRange);
}
// Reset range.
startPath = endPath;
versionsInRange = 0;
}
}
}
说明
CatalogMaintenanceService 是新添加的服务模块,对于release note 的信息集合源码看会更加清晰
参考资料
dac/backend/src/main/java/com/dremio/dac/service/catalog/CatalogMaintenanceRunnableProvider.java
services/catalog/src/main/java/com/dremio/catalog/CatalogMaintenanceService.java
dac/backend/src/main/java/com/dremio/dac/service/datasets/DatasetVersionMutator.java
dac/backend/src/main/java/com/dremio/dac/service/datasets/DatasetVersionTrimmer.java
dac/backend/src/main/java/com/dremio/dac/daemon/DACDaemonModule.java