We evaluated Knitify Pet Health Veterinary against four Gemini configurations on 30 veterinary clinical queries. The veterinary domain exposes the most severe citation hallucination gap of any domain tested. Gemini 2.5 Flash fabricates 94% of veterinary citations. Even Gemini 3.0 Pro hallucinates 48% of references. Knitify maintains 98-100% citation fidelity across all tiers and query categories.
This has direct implications for veterinary practice: species-specific dosing, drug interactions, and contraindications differ significantly from human medicine, and fabricated references to non-existent veterinary studies cannot be verified or acted upon.
Citation fidelity measures whether each cited reference is a real paper that exists in PubMed and is relevant to the claim it supports.
How citation fidelity is measured: Each reference cited by Gemini is checked programmatically. DOIs are resolved via CrossRef and PMIDs via PubMed to retrieve the real paper. An independent AI verifier then compares the resolved paper against what was claimed — checking whether the topic, authors, and study match. If the DOI returns a 404 (paper does not exist) or the resolved paper is on a different topic, the citation is marked as hallucinated. Knitify citations are verified by the model's built-in quality assurance layer.
| System | Common | Complex | Niche | Emerging | Overall |
|---|---|---|---|---|---|
| Knitify Fast | 100% | 98% | 100% | 100% | 100% |
| Knitify Standard | 97% | 97% | 100% | 100% | 98% |
| Knitify Premium | 98% | 96% | 99% | 100% | 98% |
| Gemini 3.0 Pro | 40% | 48% | 62% | 58% | 52% |
| Gemini 3.0 Flash | 26% | 23% | 26% | 28% | 26% |
| Gemini 2.5 Pro | 20% | 20% | 32% | 26% | 24% |
| Gemini 2.5 Flash | 7% | 4% | 10% | 3% | 6% |
| Domain | Knitify (avg) | Gemini 3.0 Pro | Gemini 2.5 Flash |
|---|---|---|---|
| Medical | 99% | 72% | 28% |
| Scientific | 97% | 66% | 20% |
| Pet Health | 99% | 52% | 6% |
Average number of unique peer-reviewed sources cited per response.
| System | Common | Complex | Niche | Emerging | Overall |
|---|---|---|---|---|---|
| Knitify Fast | 9.0 | 7.6 | 4.3 | 10.7 | 8.8 |
| Knitify Standard | 13.5 | 7.2 | 5.3 | 12.0 | 10.6 |
| Knitify Premium | 22.2 | 13.6 | 10.1 | 19.7 | 16.5 |
| Gemini 3.0 Pro | 5.9 | 4.6 | 6.9 | 6.1 | 5.8 |
| Gemini 3.0 Flash | 7.1 | 8.0 | 6.7 | 7.4 | 7.3 |
| Gemini 2.5 Flash | 10.8 | 10.2 | 8.3 | 14.1 | 10.8 |
Breakdown of citation volume, recency, and journal breadth across all 7 systems.
| System | Total Refs | 2025-26 | 2024 | 2023 | ≤2022 | % Recent | Unique Journals |
|---|---|---|---|---|---|---|---|
| Knitify Fast | 476 | 150 | 68 | 38 | 220 | 31% | 65 |
| Knitify Standard | 574 | 192 | 58 | 32 | 292 | 33% | 77 |
| Knitify Premium | 992 | 218 | 70 | 82 | 622 | 21% | 102 |
| Gemini 2.5 Flash | 323 | 0 | 0 | 0 | 539 | 0% | 375 |
| Gemini 2.5 Pro | 178 | 0 | 0 | 1 | 327 | 0% | 419 |
| Gemini 3.0 Flash | 184 | 0 | 0 | 6 | 296 | 0% | 272 |
| Gemini 3.0 Pro | 174 | 0 | 0 | 0 | 167 | 0% | 141 |
| 3.0 Flash + Search | 404 | 12 | 15 | 26 | 351 | 2% | 368 |
| 3.0 Pro + Search | 183 | 6 | 3 | 7 | 167 | 3% | 209 |
Knitify top journals (verified via PubMed): Journal of Veterinary Internal Medicine (18), Journal of Feline Medicine and Surgery (12), Veterinary Record (8), Journal of the American Veterinary Medical Association (7).
Neither Knitify nor Gemini cited NEJM for this domain — veterinary queries draw from specialized journals like JVIM and JFMS rather than general medical journals.
Binary metric: no wrong dose >50%, no missed major contraindication, no wrong interaction severity.
| System | Common | Complex | Niche | Emerging | Overall |
|---|---|---|---|---|---|
| Knitify Fast | 100% | 100% | 100% | 100% | 100% |
| Knitify Premium | 100% | 100% | 100% | 100% | 100% |
| Knitify Standard | 100% | 86% | 100% | 100% | 96% |
| Gemini 3.0 Pro | 100% | 100% | 100% | 100% | 100% |
| Gemini 3.0 Flash | 100% | 100% | 100% | 100% | 100% |
| Gemini 2.5 Flash | 88% | 75% | 100% | 100% | 90% |
| System | Common | Complex | Niche | Emerging | Overall |
|---|---|---|---|---|---|
| Knitify Fast | 17.7s | 16.9s | 18.4s | 18.3s | 17.8s |
| Knitify Standard | 16.5s | 16.9s | 17.5s | 23.1s | 18.3s |
| Knitify Premium | 28.5s | 26.5s | 27.2s | 32.9s | 28.7s |
| Gemini 3.0 Flash* | 13.5s | 14.2s | 13.1s | 13.4s | 13.5s |
| Gemini 3.0 Pro* | 59.0s | 51.7s | 53.3s | 59.0s | 55.7s |
| Fast | Standard | Premium | |
|---|---|---|---|
| Best for | Quick clinical lookups | Case workup support | Literature reviews, CE |
| Citation Fidelity | 100% | 98% | 98% |
| References / response | 9 | 11 | 17 |
| Avg words | ~403 | ~735 | ~1,183 |
| Time to first token | 17.8s | 18.3s | 28.7s |
| Clinical safety | 100% | 96% | 100% |
Direct comparisons between matched tiers — same-class models, all metrics.
| Metric | Knitify Fast | Gemini 3.0 Flash |
|---|---|---|
| Citation Fidelity | 100% | 26% |
| References / response | 8.8 verified | 1.9 verified (5.4 fabricated) |
| % from 2025-2026 | 31% | 0% |
| Speed (TTFT) | 17.8s | 13.5s |
| Metric | Knitify Premium | Gemini 3.0 Pro |
|---|---|---|
| Citation Fidelity | 98% | 52% |
| References / response | 16.5 verified | 3.0 verified (2.8 fabricated) |
| % from 2025-2026 | 21% | 0% |
| Speed (TTFT) | 28.7s | 55.7s |
| Metric | Knitify Fast | Flash + Search |
|---|---|---|
| Citation Fidelity | 100% | 47% |
| References / response | 8.8 verified | 3.2 verified (3.7 fabricated) |
| % from 2025-2026 | 31% | 2% |
| Speed (TTFT) | 17.8s | 23.4s |
| Metric | Knitify Premium | Pro + Search |
|---|---|---|
| Citation Fidelity | 98% | 89% |
| References / response | 16.5 verified | 4.2 verified (0.5 fabricated) |
| % from 2025-2026 | 21% | 3% |
| Speed (TTFT) | 28.7s | 68.1s |
Species-specific accuracy matters. Drug dosing, toxicity thresholds, and metabolism differ between dogs, cats, and humans. Gemini 2.5 Flash makes dangerous errors (wrong doses, missed species contraindications) on 25% of complex pharmacotherapy queries. Knitify Fast/Premium achieve 100% clinical safety.
Verified references for client communication. Every Knitify citation links to a real PubMed paper that veterinarians can share with pet owners or referring specialists — no risk of linking to non-existent papers.
17 verified references per Premium response. Deep coverage of veterinary literature for case workup, client education, and continuing education.
Seven systems were evaluated: three Knitify tiers (Fast, Standard, Premium) and four Gemini configurations (2.5 Flash, 2.5 Pro, 3.0 Flash, 3.0 Pro). All systems received identical queries. Gemini systems were prompted to cite veterinary peer-reviewed sources with author, title, journal, year, and DOI or PubMed link.
Clinical safety scored by an independent AI judge. Citation fidelity verified by checking each reference against PubMed — independent of the AI judge.
Each cited reference is checked against PubMed to confirm it is a real, on-topic paper. A citation passes if (a) the identifier resolves to an existing paper and (b) the paper is relevant to the claim. Knitify citations are verified by the model's built-in quality assurance layer.
All Gemini systems received the following prompt template for each query:
You are a medical research assistant. Answer the following research question thoroughly. Support every claim with citations to peer-reviewed sources. For each citation, include: - First author et al. - Paper title - Journal name - Year of publication - DOI or PubMed link if available Target approximately [TARGET_WORDS] words for the main answer (excluding references). Format your references in a numbered list at the end. Research question: [QUERY]
Gemini models were called via the Gemini API (generativelanguage.googleapis.com) with the prompt above. No search grounding or retrieval tools were enabled — responses are generated entirely from model parameters.
Target word counts were matched to the corresponding Knitify tier to ensure comparable output length.
| # | Query |
|---|---|
| 1 | What is the evidence-based approach to feline hyperthyroidism treatment: radioiodine vs methimazole vs surgical thyroidectomy, including CKD unmasking risk? |
| 2 | What are the IRIS staging criteria for chronic kidney disease in cats and the evidence-based interventions at each stage? |
| 3 | What is the evidence for pimobendan in preclinical canine dilated cardiomyopathy from the PROTECT study and related trials? |
| 4 | What are the current heartworm disease treatment protocols in dogs including the melarsomine slow-kill protocol and pulmonary thromboembolism prevention? |
| 5 | What is the comparative efficacy of first-line anticonvulsants in canine epilepsy: phenobarbital vs potassium bromide vs levetiracetam? |
| 6 | What is the evidence for immune-mediated hemolytic anemia treatment in dogs including immunosuppression protocols and transfusion thresholds? |
| 7 | How should canine diabetes mellitus be managed — what insulin types (PZI, glargine, NPH) are used and how do remission rates compare to cats? |
| 8 | What is the current diagnostic and treatment approach to canine Cushing's disease including LDDS vs HDDS vs ACTH stimulation test accuracy? |
| # | Query |
|---|---|
| 9 | What are the comparative GI toxicity profiles of NSAIDs in dogs (carprofen, meloxicam, grapiprant, deracoxib) and the evidence for each? |
| 10 | What is the evidence for fluoroquinolone retinal toxicity in cats, the enrofloxacin dose threshold, and safe alternative antibiotics? |
| 11 | How do tramadol's analgesic effects differ between cats and dogs due to species differences in CYP2D6 metabolism and M1 metabolite production? |
| 12 | What is the evidence for phenobarbital drug interactions in dogs — CYP induction effects on cyclosporine, doxycycline, and thyroxine levels? |
| 13 | What is the comparative evidence for cyclosporine vs oclacitinib (Apoquel) for canine atopic dermatitis including onset of action and long-term safety? |
| 14 | What are the chemotherapy toxicity monitoring protocols in small animals for CHOP and carboplatin, including neutropenia nadir timing and dose modification? |
| 15 | What is the evidence for doxycycline esophageal stricture risk in cats and the recommended formulation and administration practices? |
| 16 | What is the evidence for gabapentin in cats for both pain management and anxiety, including dosing differences for each indication? |
| # | Query |
|---|---|
| 17 | What is the evidence for prescription renal diets in cats with CKD including phosphorus restriction levels and impact on survival? |
| 18 | What is the clinical evidence for high-protein low-carbohydrate diets in achieving diabetic remission in cats? |
| 19 | What is the evidence for omega-3 fatty acid supplementation in dogs with CKD — anti-inflammatory effects and survival data? |
| 20 | What is the evidence for medium-chain triglyceride diets in canine idiopathic epilepsy including clinical trial data and proposed mechanisms? |
| 21 | How should hydrolyzed protein diets vs novel protein diets be compared for food-responsive enteropathy in dogs? |
| 22 | What is the clinical trial evidence for Hill's y/d iodine-restricted diet in managing feline hyperthyroidism and its compliance limitations? |
| 23 | What are the evidence-based phosphate binder options (calcium carbonate, sevelamer, lanthanum) for feline CKD and their comparative efficacy? |
| # | Query |
|---|---|
| 24 | What is the current clinical evidence for GS-441524 and remdesivir in treating feline infectious peritonitis, including remission criteria and relapse rates? |
| 25 | What is the evidence for toceranib (Palladia) in canine mast cell tumors including response rates by Patnaik grade? |
| 26 | What is the evidence for metronomic chemotherapy with low-dose cyclophosphamide in dogs — mechanisms, indications, and urinary toxicity monitoring? |
| 27 | What is the current evidence for stem cell therapy in canine osteoarthritis including clinical trial data and regulatory status? |
| 28 | What is the evidence for BOAS (brachycephalic obstructive airway syndrome) surgical vs medical management outcomes and quality of life data? |
| 29 | What is the current evidence for antibiotic stewardship in veterinary medicine and the data on resistance transmission between animals and humans? |
| 30 | What are the current multimodal chronic pain management protocols for cats including evidence for gabapentin, buprenorphine, and meloxicam combinations? |