Google Cloud Platform Solutions: Data, AI, and Infrastructure Best Practices
Classified in Technology
Written on in
English with a size of 3.05 KB
BigQuery for Predictive Modeling & Geospatial Data
- Petabyte-scale data warehousing
- GeoJSON and geospatial processing
- Predictive modeling capabilities
IoT Data Processing with Kafka & Dataflow Alerts
- Kafka I/O as input stream
- Dataflow for stream ingestion and windowed processing
- Alerting if moving average falls below 4,000 messages
Cloud SQL for MySQL: High Availability & Zone Failure
- Utilizing failover replicas
- Deployment in a different zone within the same region
- Read replicas for high-scaling scenarios
Kafka for Centralized Data Ingestion & Delivery
- Ability to manage offsets within topics
- Publish/subscribe across multiple topics
- Retaining key ordering and messages for extended periods
Apache Hadoop Migration to Google Cloud Platform
- Cost-effective storage using Persistent Disks
- Leveraging 50% Preemptible Workers for cost savings
- Google Cloud Storage (GCS) for data storage
- Utilizing Dataproc clusters
Improving AUC Score: Strategies for Model Optimization
- Employing hyperparameter tuning
Dataproc Cluster Security: Managing Dependencies Offline
- Moving dependencies to Google Cloud Storage (GCS)
- Ensuring GCS is within a VPC Service Controls perimeter
Scalable, Transactional SQL Database for 6TB Workloads
- Solution: Cloud Spanner
On-Premise to GCP Database Migration: 20TB OLTP
- Solution: Cloud SQL
Database for Collecting CPU and Memory Statistics
- Bigtable with a narrow table design
- Row Key structure: <comp_engine, comp_id, comp_timestamp>
- Data collected every second
GCS Data Security: Implementing a "Trust No One" Policy
- Using gcloud kms for symmetric key management
- Encrypting files with KMS keys and unique Additional Authenticated Data (AAD)
- Uploading to GCS using gsutil cp
- Storing AAD outside of Google Cloud Platform
Managed Services for Performance & Failure Alerting
- Cloud Monitoring with robust alerting policies
Data Structuring for BQ ML Linear Regression: State, City
- One-hot encoding using SQL (avoiding Data Fusion)
- Representing states as rows
- Representing cities as columns
- Using 1 or 0 for encoding values
ACID-Compliant SQL Database for Bank Transactions
- Cloud Spanner, offering ACID compliance and lock read-write transactions for consistency and conflict resolution
- Important: Avoid stale reads, as they can provide delayed data for critical bank transactions
BigQuery Query Performance & Partition Restructuring
- Transitioning from ingest-date partitioning to per-item ID partitioning for improved performance