Knitify Pet Health Veterinary

Technical Specification & Benchmark Report
Model: KNITIFY-PET-HEALTH-VETERINARY-0x001  |  Production  |  March 2026
Prepared by Innovo Health Labs

Abstract

We evaluated Knitify Pet Health Veterinary against four Gemini configurations on 30 veterinary clinical queries. The veterinary domain exposes the most severe citation hallucination gap of any domain tested. Gemini 2.5 Flash fabricates 94% of veterinary citations. Even Gemini 3.0 Pro hallucinates 48% of references. Knitify maintains 98-100% citation fidelity across all tiers and query categories.

This has direct implications for veterinary practice: species-specific dosing, drug interactions, and contraindications differ significantly from human medicine, and fabricated references to non-existent veterinary studies cannot be verified or acted upon.


1. Citation Fidelity

Citation fidelity measures whether each cited reference is a real paper that exists in PubMed and is relevant to the claim it supports.

Figure 1: Overall Citation Fidelity — The Veterinary Gap
Knitify Fast
100%
Knitify Standard
98%
Knitify Premium
98%
Gemini 2.5 Flash
6%
Gemini 2.5 Pro
24%
Gemini 3.0 Flash
26%
Gemini 3.0 Pro
52%
Gemini 3.0 Flash + Search
47%
Gemini 3.0 Pro + Search
89%
Figure 1. 30 veterinary queries. Gemini 2.5 Flash fabricates 94% of all veterinary citations.
Veterinary is the hardest domain for general-purpose AI models. General-purpose models struggle significantly on veterinary topics — their hallucination rate increases dramatically compared to mainstream medical queries.

How citation fidelity is measured: Each reference cited by Gemini is checked programmatically. DOIs are resolved via CrossRef and PMIDs via PubMed to retrieve the real paper. An independent AI verifier then compares the resolved paper against what was claimed — checking whether the topic, authors, and study match. If the DOI returns a 404 (paper does not exist) or the resolved paper is on a different topic, the citation is marked as hallucinated. Knitify citations are verified by the model's built-in quality assurance layer.

Citation Fidelity by Query Difficulty

SystemCommonComplexNicheEmergingOverall
Knitify Fast100%98%100%100%100%
Knitify Standard97%97%100%100%98%
Knitify Premium98%96%99%100%98%
Gemini 3.0 Pro40%48%62%58%52%
Gemini 3.0 Flash26%23%26%28%26%
Gemini 2.5 Pro20%20%32%26%24%
Gemini 2.5 Flash7%4%10%3%6%
Table 1. Knitify maintains 96-100% across all tiers. Gemini 2.5 Flash: 6% — 94% of veterinary citations are fabricated.
Figure 2: Cross-Domain Comparison — The Gap Widens on Specialized Domains
DomainKnitify (avg)Gemini 3.0 ProGemini 2.5 Flash
Medical99%72%28%
Scientific97%66%20%
Pet Health99%52%6%
Table 2. The less mainstream the domain, the worse general-purpose models hallucinate. Knitify is domain-agnostic.

2. Reference Density

Average number of unique peer-reviewed sources cited per response.

Figure 3: Verified References Per Response
Knitify Fast
8.8 verified
Knitify Standard
10.6 verified
Knitify Premium
16.5 verified
Gemini 3.0 Flash
1.9
5.4✗
Gemini 3.0 Pro
3.0
2.8✗
Flash + Search
3.2
3.7✗
Pro + Search
4.2
0.5✗
Figure 3. *Gemini 2.5 Flash cites 10.8 refs — but only 6% are real. Knitify Premium: 16.5 refs, 98% verified.
SystemCommonComplexNicheEmergingOverall
Knitify Fast9.07.64.310.78.8
Knitify Standard13.57.25.312.010.6
Knitify Premium22.213.610.119.716.5
Gemini 3.0 Pro5.94.66.96.15.8
Gemini 3.0 Flash7.18.06.77.47.3
Gemini 2.5 Flash10.810.28.314.110.8
Figure 2. Color = verified references. Grey (✗) = fabricated. Knitify references are all verified.

3. Reference Quality & Recency

Breakdown of citation volume, recency, and journal breadth across all 7 systems.

SystemTotal Refs2025-2620242023≤2022% RecentUnique Journals
Knitify Fast476150683822031%65
Knitify Standard574192583229233%77
Knitify Premium992218708262221%102
Gemini 2.5 Flash3230005390%375
Gemini 2.5 Pro1780013270%419
Gemini 3.0 Flash1840062960%272
Gemini 3.0 Pro1740001670%141
3.0 Flash + Search4041215263512%368
3.0 Pro + Search1836371673%209
Total references cited across 30 queries. "Unique Journals" = verified via PubMed for Knitify; claimed from text for Gemini (28-72% of Gemini DOIs resolve to wrong papers).

Journal Quality

Knitify top journals (verified via PubMed): Journal of Veterinary Internal Medicine (18), Journal of Feline Medicine and Surgery (12), Veterinary Record (8), Journal of the American Veterinary Medical Association (7).

Neither Knitify nor Gemini cited NEJM for this domain — veterinary queries draw from specialized journals like JVIM and JFMS rather than general medical journals.

21% of Knitify Premium citations are from 2025-2026 — the lowest recency across all 6 domains, reflecting the smaller volume of recent veterinary publications. Gemini has 0% from these years. With Google Search grounding, Gemini reaches 2-3%.

4. Clinical Safety

Binary metric: no wrong dose >50%, no missed major contraindication, no wrong interaction severity.

SystemCommonComplexNicheEmergingOverall
Knitify Fast100%100%100%100%100%
Knitify Premium100%100%100%100%100%
Knitify Standard100%86%100%100%96%
Gemini 3.0 Pro100%100%100%100%100%
Gemini 3.0 Flash100%100%100%100%100%
Gemini 2.5 Flash88%75%100%100%90%
Table 4. Gemini 2.5 Flash produces dangerous errors on 25% of complex veterinary pharmacotherapy queries — wrong species-specific doses or missed contraindications.

5. Speed of Answer

Figure 4: Time to First Token (seconds, lower is better)
Gemini 3.0 Flash*
13.5s
Knitify Fast
17.8s
Knitify Standard
18.3s
Knitify Premium
28.7s
Gemini 3.0 Pro*
55.7s
3.0 Flash + Search*
23.4s
3.0 Pro + Search*
68.1s
Figure 4. *Gemini TTFT = total response time (non-streaming). Knitify Fast (18s) delivers 100% CF for a 4-second premium over Gemini Flash (14s, 16% CF).
SystemCommonComplexNicheEmergingOverall
Knitify Fast17.7s16.9s18.4s18.3s17.8s
Knitify Standard16.5s16.9s17.5s23.1s18.3s
Knitify Premium28.5s26.5s27.2s32.9s28.7s
Gemini 3.0 Flash*13.5s14.2s13.1s13.4s13.5s
Gemini 3.0 Pro*59.0s51.7s53.3s59.0s55.7s
Table 5. Knitify Standard (18s) is 3× faster than Gemini 3.0 Pro (56s).

6. Knitify Tier Comparison

FastStandardPremium
Best forQuick clinical lookupsCase workup supportLiterature reviews, CE
Citation Fidelity100%98%98%
References / response91117
Avg words~403~735~1,183
Time to first token17.8s18.3s28.7s
Clinical safety100%96%100%


7. Head-to-Head Comparisons

Direct comparisons between matched tiers — same-class models, all metrics.

Knitify Fast vs Gemini 3.0 Flash

MetricKnitify FastGemini 3.0 Flash
Citation Fidelity100%26%
References / response8.8 verified1.9 verified (5.4 fabricated)
% from 2025-202631%0%
Speed (TTFT)17.8s13.5s

Knitify Premium vs Gemini 3.0 Pro

MetricKnitify PremiumGemini 3.0 Pro
Citation Fidelity98%52%
References / response16.5 verified3.0 verified (2.8 fabricated)
% from 2025-202621%0%
Speed (TTFT)28.7s55.7s

Knitify Fast vs Gemini 3.0 Flash + Google Search

MetricKnitify FastFlash + Search
Citation Fidelity100%47%
References / response8.8 verified3.2 verified (3.7 fabricated)
% from 2025-202631%2%
Speed (TTFT)17.8s23.4s

Knitify Premium vs Gemini 3.0 Pro + Google Search

MetricKnitify PremiumPro + Search
Citation Fidelity98%89%
References / response16.5 verified4.2 verified (0.5 fabricated)
% from 2025-202621%3%
Speed (TTFT)28.7s68.1s

8. Summary

Veterinary is where general-purpose AI fails hardest. Gemini 2.5 Flash fabricates 94% of veterinary citations. Knitify maintains 98-100% across all veterinary query categories.

Species-specific accuracy matters. Drug dosing, toxicity thresholds, and metabolism differ between dogs, cats, and humans. Gemini 2.5 Flash makes dangerous errors (wrong doses, missed species contraindications) on 25% of complex pharmacotherapy queries. Knitify Fast/Premium achieve 100% clinical safety.

Verified references for client communication. Every Knitify citation links to a real PubMed paper that veterinarians can share with pet owners or referring specialists — no risk of linking to non-existent papers.

17 verified references per Premium response. Deep coverage of veterinary literature for case workup, client education, and continuing education.

Appendix A: Evaluation Setup

A.1 Benchmark Design

Seven systems were evaluated: three Knitify tiers (Fast, Standard, Premium) and four Gemini configurations (2.5 Flash, 2.5 Pro, 3.0 Flash, 3.0 Pro). All systems received identical queries. Gemini systems were prompted to cite veterinary peer-reviewed sources with author, title, journal, year, and DOI or PubMed link.

A.2 Evaluation Judge

Clinical safety scored by an independent AI judge. Citation fidelity verified by checking each reference against PubMed — independent of the AI judge.

A.3 Citation Verification

Each cited reference is checked against PubMed to confirm it is a real, on-topic paper. A citation passes if (a) the identifier resolves to an existing paper and (b) the paper is relevant to the claim. Knitify citations are verified by the model's built-in quality assurance layer.

A.4 Query Tiers


A.5 Gemini Prompt

All Gemini systems received the following prompt template for each query:

You are a medical research assistant. Answer the following research question thoroughly.
Support every claim with citations to peer-reviewed sources. For each citation, include:
- First author et al.
- Paper title
- Journal name
- Year of publication
- DOI or PubMed link if available

Target approximately [TARGET_WORDS] words for the main answer (excluding references).
Format your references in a numbered list at the end.

Research question: [QUERY]

Gemini models were called via the Gemini API (generativelanguage.googleapis.com) with the prompt above. No search grounding or retrieval tools were enabled — responses are generated entirely from model parameters.

Target word counts were matched to the corresponding Knitify tier to ensure comparable output length.


Appendix B: Test Queries

B.1 Common — Internal Medicine (8 queries)

#Query
1What is the evidence-based approach to feline hyperthyroidism treatment: radioiodine vs methimazole vs surgical thyroidectomy, including CKD unmasking risk?
2What are the IRIS staging criteria for chronic kidney disease in cats and the evidence-based interventions at each stage?
3What is the evidence for pimobendan in preclinical canine dilated cardiomyopathy from the PROTECT study and related trials?
4What are the current heartworm disease treatment protocols in dogs including the melarsomine slow-kill protocol and pulmonary thromboembolism prevention?
5What is the comparative efficacy of first-line anticonvulsants in canine epilepsy: phenobarbital vs potassium bromide vs levetiracetam?
6What is the evidence for immune-mediated hemolytic anemia treatment in dogs including immunosuppression protocols and transfusion thresholds?
7How should canine diabetes mellitus be managed — what insulin types (PZI, glargine, NPH) are used and how do remission rates compare to cats?
8What is the current diagnostic and treatment approach to canine Cushing's disease including LDDS vs HDDS vs ACTH stimulation test accuracy?

B.2 Complex — Pharmacotherapy (8 queries)

#Query
9What are the comparative GI toxicity profiles of NSAIDs in dogs (carprofen, meloxicam, grapiprant, deracoxib) and the evidence for each?
10What is the evidence for fluoroquinolone retinal toxicity in cats, the enrofloxacin dose threshold, and safe alternative antibiotics?
11How do tramadol's analgesic effects differ between cats and dogs due to species differences in CYP2D6 metabolism and M1 metabolite production?
12What is the evidence for phenobarbital drug interactions in dogs — CYP induction effects on cyclosporine, doxycycline, and thyroxine levels?
13What is the comparative evidence for cyclosporine vs oclacitinib (Apoquel) for canine atopic dermatitis including onset of action and long-term safety?
14What are the chemotherapy toxicity monitoring protocols in small animals for CHOP and carboplatin, including neutropenia nadir timing and dose modification?
15What is the evidence for doxycycline esophageal stricture risk in cats and the recommended formulation and administration practices?
16What is the evidence for gabapentin in cats for both pain management and anxiety, including dosing differences for each indication?

B.3 Niche — Therapeutic Nutrition (7 queries)

#Query
17What is the evidence for prescription renal diets in cats with CKD including phosphorus restriction levels and impact on survival?
18What is the clinical evidence for high-protein low-carbohydrate diets in achieving diabetic remission in cats?
19What is the evidence for omega-3 fatty acid supplementation in dogs with CKD — anti-inflammatory effects and survival data?
20What is the evidence for medium-chain triglyceride diets in canine idiopathic epilepsy including clinical trial data and proposed mechanisms?
21How should hydrolyzed protein diets vs novel protein diets be compared for food-responsive enteropathy in dogs?
22What is the clinical trial evidence for Hill's y/d iodine-restricted diet in managing feline hyperthyroidism and its compliance limitations?
23What are the evidence-based phosphate binder options (calcium carbonate, sevelamer, lanthanum) for feline CKD and their comparative efficacy?

B.4 Emerging — Novel Treatments (7 queries)

#Query
24What is the current clinical evidence for GS-441524 and remdesivir in treating feline infectious peritonitis, including remission criteria and relapse rates?
25What is the evidence for toceranib (Palladia) in canine mast cell tumors including response rates by Patnaik grade?
26What is the evidence for metronomic chemotherapy with low-dose cyclophosphamide in dogs — mechanisms, indications, and urinary toxicity monitoring?
27What is the current evidence for stem cell therapy in canine osteoarthritis including clinical trial data and regulatory status?
28What is the evidence for BOAS (brachycephalic obstructive airway syndrome) surgical vs medical management outcomes and quality of life data?
29What is the current evidence for antibiotic stewardship in veterinary medicine and the data on resistance transmission between animals and humans?
30What are the current multimodal chronic pain management protocols for cats including evidence for gabapentin, buprenorphine, and meloxicam combinations?
About Knitify Pet Health Veterinary
Knitify Pet Health Veterinary is the most accurate AI model for veterinary medicine, achieving 98-100% citation fidelity — the domain where general AI models fail hardest (Gemini drops to 6-53%). Built for veterinary clinicians and animal health researchers who need species-specific, evidence-based answers with verified PubMed citations. Available via API at knitify.innovohealthlabs.com.