Role & responsibilities
- Architect, implement, and harden onprem GenAI stacks using opensource models (e.g., LLaMA, Mistral, Falcon) with GPU/CPU acceleration and secure networking.
- Design and develop Agentic AI systems (autonomous agents, tool use, workflow orchestration) to automate complex IT/Business processes.
- Build RetrievalAugmented Generation (RAG) pipelines with robust embedding strategies and relevance tuning.
- Finetune LLMs using parameterefficient techniques and track experiments with MLflow.
- Develop secure backend services and RESTful APIs; implement SSL/TLS, OAuth2/OIDC, JWT, RBAC, CSRF protection.
- Integrate and operate solutions on multicloud platforms (Azure AI, AWS SageMaker, GCP Vertex AI); design hybrid patterns across onprem and cloud.
- Containerize and orchestrate services with Docker, Helm, and Kubernetes; employ GitOps and IaC (Terraform) for repeatable deployments.
- Establish CI/CD pipelines (GitHub Actions or equivalent) for model and application delivery; enforce image hardening and vulnerability scanning.
- Implement observability using Prometheus, Grafana, ELK/EFK, and OpenTelemetry; define SLOs, alerts, and runbooks.
- Ensure Responsible AI compliance: privacy, safety, bias/variance assessments, transparency, humanintheloop and model governance.
- Collaborate with data engineering on ETL pipelines, data preprocessing, and secure data ingestion from external sources.
- Document architectures, threat models, test plans, and operational procedures; contribute to best practices and internal tooling.
Preferred candidate profile
58+ years of software/ML engineering; 3+ years focused on Generative AI/LLMs.
- Expertise in Python
- Handson with Hugging Face Transformers, LangChain, LlamaIndex; Llamaguard prompt engineering and evaluation.
- Experience deploying and operating opensource LLMs onprem (LLaMA, Mistral) and managing model lifecycle.
- Strong grasp of RAG architectures, vector databases (eg Chroma,).
- Proficiency in Kubernetes administration, Linux systems, Docker/Helm; CI/CD (GitHub Actions) and GitOps .
- DevSecOps practices: Vaultbased secrets, image hardening, vulnerability scanning.
- Observability & Monitoring: Prometheus, Grafana, ELK, OpenTelemetry.
- Securityfirst backend/API development: SSL/TLS, OAuth2/OIDC, JWT, RBAC, CSRF.
- Cloud platforms: Azure AI, AWS SageMaker, GCP Vertex AI; hybrid/multicloud design.
- Data engineering exposure: ETL, preprocessing; ORM familiarity; RDBMS + one NoSQL system.