Deep Learning Analysis and Data Management Systems
Classified in Computers
Written on in
English with a size of 3.25 KB
DeepBase: Deep Inspection of Neural Networks
Introduction
- Neural networks (NNs) are revolutionizing a wide range of machine intelligence tasks.
- However, it remains unclear how and why neural networks are so effective.
- One approach to understanding how NNs work is studying how and when neurons activate for test data; this class of analysis is called Deep Neural Inspection (DNI).
Deep Neural Inspection (DNI)
Given user-provided hypothesis logic (e.g., “detects nouns”, “detects keywords”), DNI seeks to quantify the extent that the behavior of hidden units (e.g., the magnitude or the derivative of their output) is similar to the hypothesis logic when running the model over a test set. This quantification is performed using statistical measures.
Approaches for Interpretation
- Saliency Analysis
- Statistical Analysis
DeepBase Optimizations
- Shared Computation via Model Merging
- Early Stopping
- Streaming Behavior Extraction
Dissecting the Performance of Strongly Consistent Replication Protocols
Distributed Consensus
- The fault-tolerant distributed consensus problem is addressed by Paxos and its variants.
- The performance of Paxos protocols is critical for the overall performance of distributed databases and systems.
- Paxos protocols show widely varying performance based on network, workload, deployment size, topology, and failures.
Data Platform for Machine Learning
ML Frameworks and Data
- Most platforms do not support robust data management.
- ML data evolves in three ways:
- Variety: Rich data types (images, text, sensor data).
- Volume: Size of the dataset.
- Velocity: Change in data (slow in raw, faster in annotations and features).
Future Work
- Data exploration vs. catalog searches
- Reducing data latency and human-in-the-loop time
- Prefetching prediction and local buffering
- Ecosystem integration
- MLdp own format (linear amount of format connectors)
- Native data format (quadratic amount of format connectors)
HoloDetect: Few-Shot Learning for Error Detection
Motivation
- Data error is a persistent issue.
- Analytical tasks require high-quality data; without it, the result is “garbage in, garbage out.”
- Current solutions—handcrafted rule-based methods, pattern-driven methods, and outlier detection—often suffer from low accuracy, lack of generalization, and high maintenance costs.
Methodology
- Training data acquisition
- Data augmentation
- Transformation learning
- Generating error samples
- Experiments: Compared methods and model variants
Conclusion
Error detection can be effectively handled as an ML problem. Training data acquisition is a significant challenge due to imbalanced data, but generative processes can be used to augment data by synthesizing erroneous samples. HoloDetect achieved an average precision of ~94% and an average recall of ~93% across a diverse array of datasets.