Machine learning engineer

Hemanth Sai Kosari

About

🚀 Machine Learning Engineer with a strong focus on LLMs, Computer Vision, and scalable ML systems.

💡 Experienced in building end-to-end pipelines—from data processing and feature engineering to model optimization and deployment—with proven impact on real-world problems.

⚙️ Passionate about designing efficient, production-ready AI systems that combine performance with interpretability.

View experience Get in touch

Experience

Roles in ML engineering, computer vision, and data systems.

Coveur.ai

Machine Learning Co-op Client: James River Insurance

Aug 2025 – Dec 2025 Houston, TX
- Built M&C quote-and-bind ML classifiers by converting unstructured legacy insurance records into structured datasets.
- Standardized multi-source policy data, aligned schemas, and ran EDA in WSL to detect drift and data inconsistencies.
- Engineered feature pipelines with interaction terms and exhaustive encodings to generate model inputs.
- Ran automated hyperparameter search with Hyperopt and Optuna to maximize PR-AUC.
- Benchmarked XGBoost, CatBoost, and GPU-accelerated cuML with stratified CV across Recall, F1, and PR-AUC.
- Shipped training pipelines that improved Recall by 16% and PR-AUC by 11%, supporting production retraining.
Coveur.ai

AI Engineer Intern

May 2025 – Aug 2025 Houston, TX
- Crawled DOI sites across all 50 U.S. states for surplus-lines tax rules and compliance statutes.
- Parsed and normalized HTML, Markdown, and PDFs into structured JSONL instruction sets for LLM tuning.
- Applied token-aware chunking, deduplication, and metadata tagging to stabilize context windows and training.
- Generated synthetic Q&A with Azure AI and prompt conditioning to broaden regulatory coverage.
- Fine-tuned Qwen3-14B with 4-bit quantization and SFT for domain compliance QA.
- Improved numerical reasoning accuracy by 17% and response consistency by 21% on internal benchmarks.
Indian Institute of Technology

Software Engineer

Apr 2023 – Jan 2024 Hyderabad, India
- Curated and annotated 12K+ indoor images, adding 10 novel classes to extend a COCO-style taxonomy.
- Trained and fine-tuned YOLOv7 detectors, reaching 92% mAP under challenging lighting and occlusion.
- Ran temporal error analysis, robustness tests, and latency profiling before deployment.
Anurag University

Computer Vision Research Assistant

Sep 2022 – Dec 2022 Hyderabad, India
- Researched monocular and stereo depth using intensity gradients and calibrated disparity triangulation.
- Applied epipolar geometry and disparity optimization, achieving under 5% depth error on 10K+ indoor pairs.
TMI Network

Software Engineering Intern

Sep 2022 – Dec 2022 Hyderabad, India
- Built a CNN-based ingestion and classification pipeline reaching 88% detection accuracy.
- Deployed with Docker and Kubernetes, cutting retrieval latency by 60% and saving 80+ hours per week.

Projects

Insurance analytics, healthcare AI, computer vision, and interpretable ML.

RAG · Geospatial · Insurance
P&C Insurance Copilot with RAG and Geospatial Risk Modeling
- Built an end-to-end insurance risk modeling system using XGBoost to generate ZIP-level risk scores across 45K+ locations.
- Aggregated and processed 638K+ multi-source records, including FEMA, crime, and weather data, for geospatial risk analysis.
- Engineered normalized features by fusing heterogeneous geospatial signals to improve predictive modeling performance.
- Designed scalable data pipelines for risk scoring and preprocessing, ensuring consistent model performance across regions.
- Developed a RAG pipeline over 500+ policy documents to enable contextual insurance insights and semantic querying.
- Reduced manual underwriting and analysis effort by approximately 70% through automated insight extraction and decision support.
- Python
- XGBoost
- RAG
- Geospatial
View on GitHub
Computer Vision · XAI
Explainable Object Detection
- Built an end-to-end framework for interpretable object detection by combining modern detection models with explainability and evaluation components.
- Integrated YOLOv8 for object localization and CLIP for visual–text alignment to validate detections against semantic class representations.
- Generated Grad-CAM heatmaps for saliency visualization and incorporated LLaVA to produce optional natural language explanations for predictions.
- Designed a multi-stage pipeline per detection including localization, similarity scoring, explainability, and quality assessment using alignment and faithfulness metrics.
- Developed a Streamlit-based web application for interactive visualization and analysis of detection results and explanations.
- Implemented CLI tools for batch processing, COCO-style evaluation, JSON export, and visualization to support scalable experimentation and demos.
- Python
- YOLOv8
- CLIP
- LLaVA
- Grad-CAM
- Streamlit
View on GitHub
Healthcare · RAG · Gemini
Health Risk Analyzer & Chat Assistant
- Built a full-stack Streamlit application that integrates structured health data (age, BMI, vitals, symptoms) with uploaded medical PDFs to generate personalized, document-aware insights.
- Leveraged Google Gemini 1.5 to produce structured health reports including vitals summary, key insights, recommendations, and risk categorization.
- Designed a RAG pipeline using MiniLM and SentenceTransformers for semantic retrieval and grounded question answering over medical records.
- Enabled contextual querying such as extracting historical health insights (e.g., cholesterol trends) and implemented admin-level semantic search across user datasets.
- Evaluated system performance using BLEU-4 (~0.62), ROUGE-L (~0.71), Precision@K (~0.87), and achieved ~82% factual faithfulness through manual validation.
- Implemented clustering-based risk stratification (Low, Moderate, High), role-based patient and admin workflows, MongoDB for data storage, and Plotly dashboards for analytics and visualization.
- Python
- Streamlit
- MongoDB
- Gemini
- RAG
- SentenceTransformers
- Plotly
View on GitHub