Deep Learning Analysis and Data Management Systems

Classified in Computers

Written on in English with a size of 3.25 KB

DeepBase: Deep Inspection of Neural Networks

Introduction

  • Neural networks (NNs) are revolutionizing a wide range of machine intelligence tasks.
  • However, it remains unclear how and why neural networks are so effective.
  • One approach to understanding how NNs work is studying how and when neurons activate for test data; this class of analysis is called Deep Neural Inspection (DNI).

Deep Neural Inspection (DNI)

Given user-provided hypothesis logic (e.g., “detects nouns”, “detects keywords”), DNI seeks to quantify the extent that the behavior of hidden units (e.g., the magnitude or the derivative of their output) is similar to the hypothesis logic when running the model over a test set. This quantification is performed using statistical measures.

Approaches for Interpretation

  1. Saliency Analysis
  2. Statistical Analysis

DeepBase Optimizations

  • Shared Computation via Model Merging
  • Early Stopping
  • Streaming Behavior Extraction

Dissecting the Performance of Strongly Consistent Replication Protocols

Distributed Consensus

  • The fault-tolerant distributed consensus problem is addressed by Paxos and its variants.
  • The performance of Paxos protocols is critical for the overall performance of distributed databases and systems.
  • Paxos protocols show widely varying performance based on network, workload, deployment size, topology, and failures.

Data Platform for Machine Learning

ML Frameworks and Data

  • Most platforms do not support robust data management.
  • ML data evolves in three ways:
    • Variety: Rich data types (images, text, sensor data).
    • Volume: Size of the dataset.
    • Velocity: Change in data (slow in raw, faster in annotations and features).

Future Work

  • Data exploration vs. catalog searches
  • Reducing data latency and human-in-the-loop time
  • Prefetching prediction and local buffering
  • Ecosystem integration
  • MLdp own format (linear amount of format connectors)
  • Native data format (quadratic amount of format connectors)

HoloDetect: Few-Shot Learning for Error Detection

Motivation

  • Data error is a persistent issue.
  • Analytical tasks require high-quality data; without it, the result is “garbage in, garbage out.”
  • Current solutions—handcrafted rule-based methods, pattern-driven methods, and outlier detection—often suffer from low accuracy, lack of generalization, and high maintenance costs.

Methodology

  • Training data acquisition
  • Data augmentation
  • Transformation learning
  • Generating error samples
  • Experiments: Compared methods and model variants

Conclusion

Error detection can be effectively handled as an ML problem. Training data acquisition is a significant challenge due to imbalanced data, but generative processes can be used to augment data by synthesizing erroneous samples. HoloDetect achieved an average precision of ~94% and an average recall of ~93% across a diverse array of datasets.

Related entries: