Knitify Pet Health Veterinary

Technical Specification & Benchmark Report

Model: KNITIFY-PET-HEALTH-VETERINARY-0x001 | Production | March 2026
Prepared by Innovo Health Labs

Abstract

We evaluated Knitify Pet Health Veterinary against four Gemini configurations on 30 veterinary clinical queries. The veterinary domain exposes the most severe citation hallucination gap of any domain tested. Gemini 2.5 Flash fabricates 94% of veterinary citations. Even Gemini 3.0 Pro hallucinates 48% of references. Knitify maintains 98-100% citation fidelity across all tiers and query categories.

This has direct implications for veterinary practice: species-specific dosing, drug interactions, and contraindications differ significantly from human medicine, and fabricated references to non-existent veterinary studies cannot be verified or acted upon.

1. Citation Fidelity

Citation fidelity measures whether each cited reference is a real paper that exists in PubMed and is relevant to the claim it supports.

Figure 1: Overall Citation Fidelity — The Veterinary Gap

Knitify Fast

100%

Knitify Standard

98%

Knitify Premium

98%

Gemini 2.5 Flash

Gemini 2.5 Pro

24%

Gemini 3.0 Flash

26%

Gemini 3.0 Pro

52%

Gemini 3.0 Flash + Search

47%

Gemini 3.0 Pro + Search

89%

Figure 1. 30 veterinary queries. Gemini 2.5 Flash fabricates 94% of all veterinary citations.

Veterinary is the hardest domain for general-purpose AI models. General-purpose models struggle significantly on veterinary topics — their hallucination rate increases dramatically compared to mainstream medical queries.

How citation fidelity is measured: Each reference cited by Gemini is checked programmatically. DOIs are resolved via CrossRef and PMIDs via PubMed to retrieve the real paper. An independent AI verifier then compares the resolved paper against what was claimed — checking whether the topic, authors, and study match. If the DOI returns a 404 (paper does not exist) or the resolved paper is on a different topic, the citation is marked as hallucinated. Knitify citations are verified by the model's built-in quality assurance layer.

Citation Fidelity by Query Difficulty

System	Common	Complex	Niche	Emerging	Overall
Knitify Fast	100%	98%	100%	100%	100%
Knitify Standard	97%	97%	100%	100%	98%
Knitify Premium	98%	96%	99%	100%	98%
Gemini 3.0 Pro	40%	48%	62%	58%	52%
Gemini 3.0 Flash	26%	23%	26%	28%	26%
Gemini 2.5 Pro	20%	20%	32%	26%	24%
Gemini 2.5 Flash	7%	4%	10%	3%	6%

Table 1. Knitify maintains 96-100% across all tiers. Gemini 2.5 Flash: 6% — 94% of veterinary citations are fabricated.

Figure 2: Cross-Domain Comparison — The Gap Widens on Specialized Domains

Domain	Knitify (avg)	Gemini 3.0 Pro	Gemini 2.5 Flash
Medical	99%	72%	28%
Scientific	97%	66%	20%
Pet Health	99%	52%	6%

Table 2. The less mainstream the domain, the worse general-purpose models hallucinate. Knitify is domain-agnostic.

2. Reference Density

Average number of unique peer-reviewed sources cited per response.

Figure 3: Verified References Per Response

Knitify Fast

8.8 verified

Knitify Standard

10.6 verified

Knitify Premium

16.5 verified

Gemini 3.0 Flash

1.9

5.4✗

Gemini 3.0 Pro

3.0

2.8✗

Flash + Search

3.2

3.7✗

Pro + Search

4.2

0.5✗

Figure 3. *Gemini 2.5 Flash cites 10.8 refs — but only 6% are real. Knitify Premium: 16.5 refs, 98% verified.

System	Common	Complex	Niche	Emerging	Overall
Knitify Fast	9.0	7.6	4.3	10.7	8.8
Knitify Standard	13.5	7.2	5.3	12.0	10.6
Knitify Premium	22.2	13.6	10.1	19.7	16.5
Gemini 3.0 Pro	5.9	4.6	6.9	6.1	5.8
Gemini 3.0 Flash	7.1	8.0	6.7	7.4	7.3
Gemini 2.5 Flash	10.8	10.2	8.3	14.1	10.8

Figure 2. Color = verified references. Grey (✗) = fabricated. Knitify references are all verified.

3. Reference Quality & Recency

Breakdown of citation volume, recency, and journal breadth across all 7 systems.

System	Total Refs	2025-26	2024	2023	≤2022	% Recent	Unique Journals
Knitify Fast	476	150	68	38	220	31%	65
Knitify Standard	574	192	58	32	292	33%	77
Knitify Premium	992	218	70	82	622	21%	102
Gemini 2.5 Flash	323	0	0	0	539	0%	375
Gemini 2.5 Pro	178	0	0	1	327	0%	419
Gemini 3.0 Flash	184	0	0	6	296	0%	272
Gemini 3.0 Pro	174	0	0	0	167	0%	141
3.0 Flash + Search	404	12	15	26	351	2%	368
3.0 Pro + Search	183	6	3	7	167	3%	209

Total references cited across 30 queries. "Unique Journals" = verified via PubMed for Knitify; claimed from text for Gemini (28-72% of Gemini DOIs resolve to wrong papers).

Journal Quality

Knitify top journals (verified via PubMed): Journal of Veterinary Internal Medicine (18), Journal of Feline Medicine and Surgery (12), Veterinary Record (8), Journal of the American Veterinary Medical Association (7).

Neither Knitify nor Gemini cited NEJM for this domain — veterinary queries draw from specialized journals like JVIM and JFMS rather than general medical journals.

21% of Knitify Premium citations are from 2025-2026 — the lowest recency across all 6 domains, reflecting the smaller volume of recent veterinary publications. Gemini has 0% from these years. With Google Search grounding, Gemini reaches 2-3%.

4. Clinical Safety

Binary metric: no wrong dose >50%, no missed major contraindication, no wrong interaction severity.

System	Common	Complex	Niche	Emerging	Overall
Knitify Fast	100%	100%	100%	100%	100%
Knitify Premium	100%	100%	100%	100%	100%
Knitify Standard	100%	86%	100%	100%	96%
Gemini 3.0 Pro	100%	100%	100%	100%	100%
Gemini 3.0 Flash	100%	100%	100%	100%	100%
Gemini 2.5 Flash	88%	75%	100%	100%	90%

Table 4. Gemini 2.5 Flash produces dangerous errors on 25% of complex veterinary pharmacotherapy queries — wrong species-specific doses or missed contraindications.

5. Speed of Answer

Figure 4: Time to First Token (seconds, lower is better)

Gemini 3.0 Flash*

13.5s

Knitify Fast

17.8s

Knitify Standard

18.3s

Knitify Premium

28.7s

Gemini 3.0 Pro*

55.7s

3.0 Flash + Search*

23.4s

3.0 Pro + Search*

68.1s

Figure 4. *Gemini TTFT = total response time (non-streaming). Knitify Fast (18s) delivers 100% CF for a 4-second premium over Gemini Flash (14s, 16% CF).

System	Common	Complex	Niche	Emerging	Overall
Knitify Fast	17.7s	16.9s	18.4s	18.3s	17.8s
Knitify Standard	16.5s	16.9s	17.5s	23.1s	18.3s
Knitify Premium	28.5s	26.5s	27.2s	32.9s	28.7s
Gemini 3.0 Flash*	13.5s	14.2s	13.1s	13.4s	13.5s
Gemini 3.0 Pro*	59.0s	51.7s	53.3s	59.0s	55.7s

Table 5. Knitify Standard (18s) is 3× faster than Gemini 3.0 Pro (56s).

6. Knitify Tier Comparison

	Fast	Standard	Premium
Best for	Quick clinical lookups	Case workup support	Literature reviews, CE
Citation Fidelity	100%	98%	98%
References / response	9	11	17
Avg words	~403	~735	~1,183
Time to first token	17.8s	18.3s	28.7s
Clinical safety	100%	96%	100%

7. Head-to-Head Comparisons

Direct comparisons between matched tiers — same-class models, all metrics.

Knitify Fast vs Gemini 3.0 Flash

Metric	Knitify Fast	Gemini 3.0 Flash
Citation Fidelity	100%	26%
References / response	8.8 verified	1.9 verified (5.4 fabricated)
% from 2025-2026	31%	0%
Speed (TTFT)	17.8s	13.5s

Knitify Premium vs Gemini 3.0 Pro

Metric	Knitify Premium	Gemini 3.0 Pro
Citation Fidelity	98%	52%
References / response	16.5 verified	3.0 verified (2.8 fabricated)
% from 2025-2026	21%	0%
Speed (TTFT)	28.7s	55.7s

Knitify Fast vs Gemini 3.0 Flash + Google Search

Metric	Knitify Fast	Flash + Search
Citation Fidelity	100%	47%
References / response	8.8 verified	3.2 verified (3.7 fabricated)
% from 2025-2026	31%	2%
Speed (TTFT)	17.8s	23.4s

Knitify Premium vs Gemini 3.0 Pro + Google Search

Metric	Knitify Premium	Pro + Search
Citation Fidelity	98%	89%
References / response	16.5 verified	4.2 verified (0.5 fabricated)
% from 2025-2026	21%	3%
Speed (TTFT)	28.7s	68.1s

8. Summary

Veterinary is where general-purpose AI fails hardest. Gemini 2.5 Flash fabricates 94% of veterinary citations. Knitify maintains 98-100% across all veterinary query categories.

Species-specific accuracy matters. Drug dosing, toxicity thresholds, and metabolism differ between dogs, cats, and humans. Gemini 2.5 Flash makes dangerous errors (wrong doses, missed species contraindications) on 25% of complex pharmacotherapy queries. Knitify Fast/Premium achieve 100% clinical safety.

Verified references for client communication. Every Knitify citation links to a real PubMed paper that veterinarians can share with pet owners or referring specialists — no risk of linking to non-existent papers.

17 verified references per Premium response. Deep coverage of veterinary literature for case workup, client education, and continuing education.

Appendix A: Evaluation Setup

A.1 Benchmark Design

Seven systems were evaluated: three Knitify tiers (Fast, Standard, Premium) and four Gemini configurations (2.5 Flash, 2.5 Pro, 3.0 Flash, 3.0 Pro). All systems received identical queries. Gemini systems were prompted to cite veterinary peer-reviewed sources with author, title, journal, year, and DOI or PubMed link.

A.2 Evaluation Judge

Clinical safety scored by an independent AI judge. Citation fidelity verified by checking each reference against PubMed — independent of the AI judge.

A.3 Citation Verification

Each cited reference is checked against PubMed to confirm it is a real, on-topic paper. A citation passes if (a) the identifier resolves to an existing paper and (b) the paper is relevant to the claim. Knitify citations are verified by the model's built-in quality assurance layer.

A.4 Query Tiers

Common (8 queries): Core veterinary internal medicine (e.g., feline hyperthyroidism, canine epilepsy)
Complex (8 queries): Species-specific pharmacotherapy and drug interactions (e.g., NSAID GI toxicity in dogs, tramadol metabolism differences)
Niche (7 queries): Therapeutic nutrition (e.g., renal diets for CKD, MCT diets for epilepsy)
Emerging (7 queries): Novel veterinary treatments (e.g., GS-441524 for FIP, stem cell therapy for OA)

A.5 Gemini Prompt

All Gemini systems received the following prompt template for each query:

You are a medical research assistant. Answer the following research question thoroughly.
Support every claim with citations to peer-reviewed sources. For each citation, include:
- First author et al.
- Paper title
- Journal name
- Year of publication
- DOI or PubMed link if available

Target approximately [TARGET_WORDS] words for the main answer (excluding references).
Format your references in a numbered list at the end.

Research question: [QUERY]

Gemini models were called via the Gemini API (generativelanguage.googleapis.com) with the prompt above. No search grounding or retrieval tools were enabled — responses are generated entirely from model parameters.

Target word counts were matched to the corresponding Knitify tier to ensure comparable output length.

Appendix B: Test Queries

B.1 Common — Internal Medicine (8 queries)

#	Query
1	What is the evidence-based approach to feline hyperthyroidism treatment: radioiodine vs methimazole vs surgical thyroidectomy, including CKD unmasking risk?
2	What are the IRIS staging criteria for chronic kidney disease in cats and the evidence-based interventions at each stage?
3	What is the evidence for pimobendan in preclinical canine dilated cardiomyopathy from the PROTECT study and related trials?
4	What are the current heartworm disease treatment protocols in dogs including the melarsomine slow-kill protocol and pulmonary thromboembolism prevention?
5	What is the comparative efficacy of first-line anticonvulsants in canine epilepsy: phenobarbital vs potassium bromide vs levetiracetam?
6	What is the evidence for immune-mediated hemolytic anemia treatment in dogs including immunosuppression protocols and transfusion thresholds?
7	How should canine diabetes mellitus be managed — what insulin types (PZI, glargine, NPH) are used and how do remission rates compare to cats?
8	What is the current diagnostic and treatment approach to canine Cushing's disease including LDDS vs HDDS vs ACTH stimulation test accuracy?

B.2 Complex — Pharmacotherapy (8 queries)

#	Query
9	What are the comparative GI toxicity profiles of NSAIDs in dogs (carprofen, meloxicam, grapiprant, deracoxib) and the evidence for each?
10	What is the evidence for fluoroquinolone retinal toxicity in cats, the enrofloxacin dose threshold, and safe alternative antibiotics?
11	How do tramadol's analgesic effects differ between cats and dogs due to species differences in CYP2D6 metabolism and M1 metabolite production?
12	What is the evidence for phenobarbital drug interactions in dogs — CYP induction effects on cyclosporine, doxycycline, and thyroxine levels?
13	What is the comparative evidence for cyclosporine vs oclacitinib (Apoquel) for canine atopic dermatitis including onset of action and long-term safety?
14	What are the chemotherapy toxicity monitoring protocols in small animals for CHOP and carboplatin, including neutropenia nadir timing and dose modification?
15	What is the evidence for doxycycline esophageal stricture risk in cats and the recommended formulation and administration practices?
16	What is the evidence for gabapentin in cats for both pain management and anxiety, including dosing differences for each indication?

B.3 Niche — Therapeutic Nutrition (7 queries)

#	Query
17	What is the evidence for prescription renal diets in cats with CKD including phosphorus restriction levels and impact on survival?
18	What is the clinical evidence for high-protein low-carbohydrate diets in achieving diabetic remission in cats?
19	What is the evidence for omega-3 fatty acid supplementation in dogs with CKD — anti-inflammatory effects and survival data?
20	What is the evidence for medium-chain triglyceride diets in canine idiopathic epilepsy including clinical trial data and proposed mechanisms?
21	How should hydrolyzed protein diets vs novel protein diets be compared for food-responsive enteropathy in dogs?
22	What is the clinical trial evidence for Hill's y/d iodine-restricted diet in managing feline hyperthyroidism and its compliance limitations?
23	What are the evidence-based phosphate binder options (calcium carbonate, sevelamer, lanthanum) for feline CKD and their comparative efficacy?

B.4 Emerging — Novel Treatments (7 queries)

#	Query
24	What is the current clinical evidence for GS-441524 and remdesivir in treating feline infectious peritonitis, including remission criteria and relapse rates?
25	What is the evidence for toceranib (Palladia) in canine mast cell tumors including response rates by Patnaik grade?
26	What is the evidence for metronomic chemotherapy with low-dose cyclophosphamide in dogs — mechanisms, indications, and urinary toxicity monitoring?
27	What is the current evidence for stem cell therapy in canine osteoarthritis including clinical trial data and regulatory status?
28	What is the evidence for BOAS (brachycephalic obstructive airway syndrome) surgical vs medical management outcomes and quality of life data?
29	What is the current evidence for antibiotic stewardship in veterinary medicine and the data on resistance transmission between animals and humans?
30	What are the current multimodal chronic pain management protocols for cats including evidence for gabapentin, buprenorphine, and meloxicam combinations?

About Knitify Pet Health Veterinary
Knitify Pet Health Veterinary is the most accurate AI model for veterinary medicine, achieving 98-100% citation fidelity — the domain where general AI models fail hardest (Gemini drops to 6-53%). Built for veterinary clinicians and animal health researchers who need species-specific, evidence-based answers with verified PubMed citations. Available via API at knitify.innovohealthlabs.com.