FHIR Workbench

Compare and track the latest advancements in Large Language Models across multiple benchmarks related to FHIR.

Supports strict search and regex • Use semicolons for multiple terms

Filter by size:
Showing 16 of 16 models
RankModelSize
FHIR-QA
FHIR-RESTQA
FHIR-ResourceID
Note2FHIR
Avg
#1🥇
GPT-4o
Closed
94.0%
92.7%
99.9%
34.7%
80.3%
#2🥈
Gemini-2-Flash
Closed
94.0%
90.0%
96.9%
34.0%
78.7%
#3🥉
Gemini-1.5-Pro
Closed
93.3%
91.3%
93.7%
34.3%
78.2%
#4
Deepseek-v3
671B
94.0%
94.0%
91.4%
32.2%
77.9%
#5
Qwen/Qwen2.5-Coder-32B-Instruct
32B
90.0%
91.3%
88.8%
33.5%
75.9%
#6
mistralai/Mistral-Small-24B-Instruct-2501
24B
88.7%
92.0%
88.6%
34.0%
75.8%
#7
Gemini-1.5-Flash
Closed
92.0%
90.7%
92.0%
24.1%
74.7%
#8
GPT-4o-mini
Closed
95.3%
94.0%
92.1%
16.3%
74.4%
#9
Qwen/Qwen2.5-Coder-7B-Instruct
7B
95.3%
87.3%
89.2%
21.2%
73.3%
#10
GPT-4.5-preview
Closed
90.7%
92.0%
N/A
36.3%
73.0%
#11
microsoft/phi-4
14B
88.7%
89.3%
82.9%
29.0%
72.5%
#12
meta-llama/Llama-3.1-8B-Instruct
8B
85.3%
88.0%
82.0%
20.8%
69.0%
#13
allenai/Llama-3.1-Tulu-3-8B
8B
83.3%
85.3%
73.2%
20.3%
65.5%
#14
google/gemma-2-9b-it
9B
57.3%
82.0%
95.2%
6.3%
60.2%
#15
BioMistral/BioMistral-7B-DARE
7B
85.3%
84.0%
58.1%
7.6%
58.8%
#16
allenai/OLMo-2-1124-7B-Instruct
7B
82.0%
74.0%
61.1%
3.8%
55.2%

Submit Your Model

Have a FHIR-capable model you want to include in our leaderboard? Simply provide the HuggingFace repo URL below, and we'll evaluate it.

Enter your HuggingFace URL and click submit. An email will be opened for you to send.