Deep Learning Analysis and Data Management Systems

Classified in Computers

Written on March 27, 2026 in English with a size of 3.25 KB

DeepBase: Deep Inspection of Neural Networks

Introduction

Neural networks (NNs) are revolutionizing a wide range of machine intelligence tasks.
However, it remains unclear how and why neural networks are so effective.
One approach to understanding how NNs work is studying how and when neurons activate for test data; this class of analysis is called Deep Neural Inspection (DNI).

Deep Neural Inspection (DNI)

Given user-provided hypothesis logic (e.g., “detects nouns”, “detects keywords”), DNI seeks to quantify the extent that the behavior of hidden units (e.g., the magnitude or the derivative of their output) is similar to the hypothesis logic when running the model over a test set. This quantification is performed using statistical measures.

Approaches for Interpretation

Saliency Analysis
Statistical Analysis

DeepBase Optimizations

Shared Computation via Model Merging
Early Stopping
Streaming Behavior Extraction

Dissecting the Performance of Strongly Consistent Replication Protocols

Distributed Consensus

The fault-tolerant distributed consensus problem is addressed by Paxos and its variants.
The performance of Paxos protocols is critical for the overall performance of distributed databases and systems.
Paxos protocols show widely varying performance based on network, workload, deployment size, topology, and failures.

Data Platform for Machine Learning

ML Frameworks and Data

Most platforms do not support robust data management.
ML data evolves in three ways:
- Variety: Rich data types (images, text, sensor data).
- Volume: Size of the dataset.
- Velocity: Change in data (slow in raw, faster in annotations and features).

Future Work

Data exploration vs. catalog searches
Reducing data latency and human-in-the-loop time
Prefetching prediction and local buffering
Ecosystem integration
MLdp own format (linear amount of format connectors)
Native data format (quadratic amount of format connectors)

HoloDetect: Few-Shot Learning for Error Detection

Motivation

Data error is a persistent issue.
Analytical tasks require high-quality data; without it, the result is “garbage in, garbage out.”
Current solutions—handcrafted rule-based methods, pattern-driven methods, and outlier detection—often suffer from low accuracy, lack of generalization, and high maintenance costs.

Methodology

Training data acquisition
Data augmentation
Transformation learning
Generating error samples
Experiments: Compared methods and model variants

Conclusion

Error detection can be effectively handled as an ML problem. Training data acquisition is a significant challenge due to imbalanced data, but generative processes can be used to augment data by synthesizing erroneous samples. HoloDetect achieved an average precision of ~94% and an average recall of ~93% across a diverse array of datasets.

Related entries:

Tags: