Industry: SECURITY
The proper way to be more data-driven – cutting data processing costs by 40%
Introduction: Faced with growing data complexity and wanting to bring a big share of the gathered data back into the product, a leading pentesting platform decided to optimize its data infrastructure, governance, and business intelligence systems with Insightify.
With the main focus on efficiency and scalability, Insightify team jumped in to lay the groundwork for expanding the company’s data infrastructure & unlock full potential for machine learning (ML) integration.
Challenges uncovered: The pentesting platform was experiencing inefficiencies due to over-partitioned data tables, high ETL costs, and underutilized infrastructure. Key challenges included:
- Data management inefficiencies: Over-partitioned BigQuery tables caused slow queries and increased metadata overhead.
- High operational costs: Inefficient Apache Airflow usage and redundant processes inflated expenses unnecessarily.
- Data quality and scalability issues: The lack of automated tests and real-time data processes created bottlenecks in delivering actionable insights.
- BI and governance limitations: Existing BI tools struggled to support growing team demands, and manual IAM processes led to governance challenges
Solutions we came up with:
Optimizing data storage and processing:
- Reduced table partitions by over 97%, enabling faster queries and minimizing metadata overhead.
- Fixed used SQL queries to enable partition pruning in BigQuery
- Introduced clustering to tables, leading to shorter query response times.
- Shifted to flat-rate BigQuery pricing, cutting data processing costs by approximately 40%.
Improved workflow management (big time):
- Replaced underutilized Apache Airflow DAGs with lighter tools, reducing associated costs by over 85%.
- Introduced robust testing and local development workflows to accelerate time-to-deployment by 60%.
Enhanced data governance:
- Centralized user management through Google Groups, streamlining access control and reducing manual effort by over 70%.
- Implemented lifecycle policies for storage buckets, lowering long-term storage expenses by 30%.
Business intelligence made clearer:
- Enhanced Looker Studio’s performance, enabling the platform to support 50% more users with consistent speed and reliability.
- Automated dbt testing processes, reducing errors in production by 35%.
Future-proofing with AI and ML:
- Integrated AutoML and Vertex AI enabling more strategic resource allocation in the company’s R&D projects.
Results:
- Cost savings: Achieved a 35% reduction in monthly operational expenses through optimized workflows and storage management.
- Improved efficiency: Query times decreased by 60% with better partitioning and clustering strategies.
- Scalability and quality: Enhanced governance and automated testing improved system reliability and supported scaling efforts.
- Actionable insights: The platform now delivers faster, more reliable insights, driving more informed decision-making and strategic planning.
Before you venture into costly ML projects that complement your already successful product, do yourself a favour & revisit your data infrastructure for strategic optimizations and a focus on scalability.
Five (5) days are enough to start transforming your infrastructure and analytics capabilities. These changes not only reduce costs, but will also position you for future growth, providing a solid foundation for advanced AI and machine learning applications.
At the end of the day, you’ll be as good as the data you feed your systems with.