Waqas Raza is an AI-Native Systems Engineer based in Lahore, Pakistan. He is Top Rated on Upwork with $175K+ earned from 168 contracts, 6,555+ billed hours, a 4.97/5 average client rating, and 13+ years of experience building production AI agents, RAG systems, SaaS platforms, payment infrastructure, fintech workflows, and Ethereum/Web3 products for global clients.

What does Waqas Raza specialize in?

Waqas Raza specializes in AI agent development (OpenAI, LangChain, LangGraph), LLM integration with RAG and guardrails, Web3 and Solidity smart contracts, full-stack development (Next.js, Node.js, TypeScript, Flutter), payment systems (Stripe), and data pipelines.

What is Waqas Raza's Upwork rating and track record?

Waqas Raza is Top Rated on Upwork with $175K+ in total earnings, 168 completed contracts, 6,555+ billed hours, and a 4.97/5 average client rating across 136 rated contracts. The site derives those public proof stats from the local Upwork history dataset.

Can Waqas Raza build AI agents and LLM-powered applications?

Yes. Waqas Raza builds production-grade AI agents with tool use, RAG, strict guardrails, and predictable cost controls using OpenAI, LangChain, LangGraph, and FastAPI. He has shipped a deep research agent, a DocOps automation agent, speech analytics platforms, and AI-powered pipelines.

Does Waqas Raza do Web3 and Solidity development?

Yes. Waqas Raza develops Solidity smart contracts on Ethereum and Base, DeFi banking platforms, ERC-4337 smart account systems, token launch studios, and milestone escrow contracts. He works with Foundry, Hardhat, and Ethers.js.

What technologies does Waqas Raza use?

Waqas Raza's core stack includes Next.js, TypeScript, Node.js, React, Flutter, Python, Supabase, PostgreSQL, Redis, Stripe, OpenAI, LangChain, LangGraph, Solidity, Foundry, Hardhat, Docker, AWS, and GCP.

Is Waqas Raza available for freelance projects?

Yes. Waqas Raza is available for freelance projects at 30+ hours per week with a typical response time of 0–4 hours. He can be hired through his Upwork profile at upwork.com/freelancers/waqasraza.

Where is Waqas Raza based?

Waqas Raza is based in Lahore, Pakistan. He works remotely with clients worldwide across the US, EU, and other regions. He is fluent in English and has worked with teams at Delivery Hero (Berlin) and Hello HD (EU startup).

RAG Without Hallucinations: What Actually Works in Production

Retrieval-Augmented Generation (RAG) is supposed to solve the hallucination problem. Give the model relevant documents, and it answers from those documents instead of from potentially-wrong training data.

In practice, RAG without guardrails just produces more confident hallucinations — the model blends retrieved content with invented content, and it sounds authoritative either way.

Here is what I have learned building production RAG systems that actually stay grounded.

Why naive RAG still hallucinations

The typical RAG pipeline:

Embed the query
Retrieve top-k chunks from the vector store
Stuff chunks into the prompt
Ask the model to answer

This works in demos. It fails in production for four reasons:

1. The retrieved chunks are irrelevant. Semantic similarity and relevance are not the same thing. A query about "cancellation policy" can retrieve chunks about "subscription management" that are topically adjacent but don't contain the answer.

2. The model ignores the context. LLMs are trained to be helpful. When the context does not contain the answer, many will answer anyway — from training data, from inference, from making something up.

3. The chunks lack enough context. A chunk that starts mid-sentence or references "the table above" makes no sense in isolation. The model fills in the gaps. Sometimes correctly. Often not.

4. There is no citation enforcement. The model answers in its own words, synthesizing across chunks. The user has no way to verify what came from the documents vs. what was invented.

Fix 1: Hard grounding instructions

The single highest-leverage change is a clear system prompt that prohibits answering outside the context:

You are a knowledge base assistant. Answer only using the provided context sections.
If the context does not contain the answer, respond with exactly:
"I don't have enough information in the provided documents to answer this question."
Do not use your training knowledge. Do not guess. Do not infer beyond what is stated.

This alone cuts hallucination rate significantly. But the model can still drift under pressure — especially if the user rephrases the question or asks follow-ups. So grounding instructions are necessary but not sufficient.

Fix 2: Retrieval quality, not retrieval quantity

More chunks is not better. Irrelevant chunks increase the chance the model synthesizes across them and invents connections.

I tune three things:

Similarity threshold: only include chunks above a minimum similarity score. I typically start at 0.75 and tune from there. Chunks below threshold are dropped, even if you asked for top-5.

Chunk overlap: chunks with 10–15% overlap with their neighbours preserve sentence continuity. A 512-token chunk with 50-token overlap means the model always sees complete thoughts.

Metadata filtering: before semantic search, apply hard filters. A query about a specific product version should only search chunks tagged with that version — not all chunks, ranked by similarity.

def retrieve(query: str, filters: dict, threshold: float = 0.75) -> list[Chunk]:
    results = vector_db.query(
        query_embedding=embed(query),
        filter=filters,
        top_k=10,
    )
    return [r for r in results if r.score >= threshold]

Fix 3: Refusal as a first-class feature

The model needs to be able to say "I don't know" — and the system needs to treat that as a success, not a failure.

I implement explicit refusal detection:

REFUSAL_PHRASES = [
    "i don't have enough information",
    "the provided documents don't contain",
    "i cannot answer this from the context",
]

def is_refusal(answer: str) -> bool:
    lower = answer.lower()
    return any(phrase in lower for phrase in REFUSAL_PHRASES)

When the model refuses, I log it. Refusals are signal — they tell you which queries your knowledge base doesn't cover, which helps you improve the content.

Treating refusals as failures (and fine-tuning or prompting to eliminate them) is how you get a system that never says "I don't know" but says confidently wrong things.

Fix 4: Citation enforcement

Every claim in the answer should be traceable to a source chunk. I enforce this structurally, not through prose instructions.

The model returns a structured output:

class AnswerWithCitations(BaseModel):
    answer: str
    citations: list[Citation]
    confidence: Literal["high", "medium", "low"]

class Citation(BaseModel):
    chunk_id: str
    quote: str  # exact quoted text from the source
    relevance: str  # one sentence explaining why this supports the answer

Requiring exact quotes is the key constraint. The model cannot quote something that is not in the retrieved context. This forces it to stay grounded or fail structured output validation.

When structured output validation fails, I retry once with a corrective prompt. If it fails again, I return a refusal with an error status.

Fix 5: Adversarial testing

Before deploying a RAG system, I run a set of adversarial queries:

Out-of-scope questions: questions the knowledge base doesn't cover. The system should refuse, not hallucinate.
Ambiguous questions: questions with multiple valid interpretations. The system should acknowledge ambiguity or ask for clarification.
Rephrased questions: the same question asked five different ways. Answers should be consistent.
Contradiction probes: if the knowledge base has conflicting information across documents, how does the system handle it?

A system that passes this suite with a high refusal rate on out-of-scope questions is better than one that attempts an answer every time.

RAG is not a plug-and-play solution. It is a pipeline that requires careful design at every stage — retrieval quality, grounding instructions, refusal handling, citation enforcement, and adversarial testing. The demos make it look easy. Production reveals the gaps.

If you are building a RAG system and need someone who has worked through these problems in real deployments, I'm available on Upwork.