Data integration for Hive DB with Cloudera for efficient data processing
For a large Asian bank
CLIENT & PROBLEM STATEMENT
- A Large Asian bank needing data engineering for machine learning (ML) models.
- Require integration of Hive DB with Cloudera for efficient data processing and ML model creation.
APPROACH
- G-Square extracted 12 million credit card records spanning 6 years from a Hive database and transferred them to Cloudera for ML processing using PySpark.
- The team developed and optimized ETL processes to ensure smooth data extraction and transformation. .
- R and Python were used for continuous data fetching to support ongoing ML model development.
- The resulting models were seamlessly integrated back into Hive for further analysis and business use.
SOLUTION & OUTPUT
- The optimized ETL processes and ML models provided real-time insights, enhancing fraud detection, customer behaviour analysis, and credit risk assessment..
- The integration into Hive allowed for streamlined access to actionable insights, improving overall decision-making and operational efficiency.
