O3-Mini可以替換DeepSeek-R1進行邏輯推理嗎？-人工智慧-PHP中文網

AI驅動的推理模型在2025年席捲世界！隨著DeepSeek-R1和O3-Mini的推出，我們在AI聊天機器人中看到了前所未有的邏輯推理能力。在本文中，我們將通過其API訪問這些模型，並評估其邏輯推理技能，以找出O3-Mini是否可以替換DeepSeek-R1。我們將比較它們在標準基準和現實世界中的性能，例如解決邏輯難題，甚至建立俄羅斯方塊遊戲！因此，搭扣並加入騎行。

目錄的

deepseek-r1 vs o3-mini：邏輯推理基準
>
deepSeek-r1 vs o3-mini：api定價比較
- 推理比較
任務1：構建俄羅斯四角
任務2：分析關係不等式
任務3：數學

邏輯推理比較比較摘要

deepseek-r1 vs o3-mini：邏輯推理基準 DeepSeek-R1和O3-Mini為結構化思維和推論提供了獨特的方法，使它們適合各種複雜的解決問題的任務。在我們談論他們的基準性能之前，讓我們首先偷偷窺視這些模型的架構。 O3米尼是Openai最先進的推理模型。它使用密集的變壓器體系結構，使用所有模型參數處理每個令牌，以實現強大的性能，但資源很高。相比之下，DeepSeek最合乎邏輯的模型R1採用了Experts（MOE）框架的混合物，僅激活每個輸入的參數子集，以提高效率。這使DeepSeek-R1在保持穩定的性能的同時更具可擴展性和計算優化。

O3-Mini可以替換DeepSeek-R1進行邏輯推理嗎？了解更多：Openai的O3米尼比DeepSeek-R1更好？

現在，我們需要看到的是這些模型在邏輯推理任務中的表現如何。首先，讓我們看一下他們在LiveBench基準測試中的表現。

來源：livebench.ai >基準結果表明，除了數學外，Openai的O3-Mini幾乎在幾乎所有方面都優於DeepSeek-R1。與DeepSeek的71.38相比，全球平均得分為73.94，O3-Mini的總體表現稍強。它在推理方面尤其出色，與DeepSeek的83.17相比，達到89.58，反映了出色的分析和解決問題的能力。

也閱讀：Google Gemini 2.0 Pro vs DeepSeek-R1：編碼更好？

deepSeek-r1 vs o3-mini：API定價比較

>由於我們正在通過其API測試這些模型，讓我們看看這些模型的成本。 >

Model	Context length	Input Price	Cached Input Price	Output Price
o3-mini	200k	.10/M tokens	.55/M tokens	.40/M tokens
deepseek-chat	64k	.27/M tokens	.07/M tokens	.10/M tokens
deepseek-reasoner	64k	.55/M tokens	.14/M tokens	.19/M tokens

在桌子上可以看出，Openai的O3-Mini在API成本方面幾乎是DeepSeek R1的兩倍。它收費為每百萬個代幣，輸入為每百萬個代幣，產出$ 4.40，而DeepSeek R1的投入率更高的價格為每百萬個代幣的成本效益更高，輸入為2.19美元，而產出的價格為2.19美元，使其成為大型應用程序的預算友好選擇。

來源：DeepSeek-r1 | O3-Mini

如何通過API

訪問DeepSeek-R1和O3-Mini

>在進行動手績效比較之前，讓我們學習如何使用API訪問DeepSeek-R1和O3-Mini。

您為此所要做的就是導入必要的庫和API鍵：>

from openai import OpenAI
from IPython.display import display, Markdown
import time

登入後複製

with open("path_of_api_key") as file:
   openai_api_key = file.read().strip()

登入後複製

> deepSeek-r1 vs o3-mini：邏輯推理比較

with open("path_of_api_key") as file:
   deepseek_api = file.read().strip()

登入後複製

>現在我們已經獲得了API訪問權限，讓我們根據其邏輯推理能力比較DeepSeek-R1和O3-Mini。為此，我們將在模型中給出相同的提示，並根據這些指標評估它們的響應：

模型花費的時間生成響應的時間

生成的響應的質量，

產生響應的成本。
然後，我們將根據其性能為每個任務的模型0或1分為0或1。因此，讓我們嘗試一下任務，看看誰在DeepSeek-R1與O3-Mini推理之戰中成為贏家！

提示：

>

“為此問題編寫Python代碼：為Tetris Game生成Python代碼”

>輸入到DeepSeek-R1 API

> DeepSeek-R1

INPUT_COST_CACHE_HIT = 0.14 / 1_000_000  # <pre class="brush:php;toolbar:false">task1_start_time = time.time()


client = OpenAI(api_key=api_key)

messages = messages=[
       {
       "role": "system",
       "content": """You are a professional Programmer with a large experience ."""


   },
{
       "role": "user",
       "content": """write a python code for this problem: generate a python code for Tetris game.
"""


   }
   ]


# Use a compatible encoding (cl100k_base is the best option for new OpenAI models)
encoding = tiktoken.get_encoding("cl100k_base")


# Calculate token counts
input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages)


completion = client.chat.completions.create(
   model="o3-mini-2025-01-31",
   messages=messages
)


output_tokens = len(encoding.encode(completion.choices[0].message.content))


task1_end_time = time.time()




input_cost_per_1k = 0.0011  # Example: <pre class="brush:php;toolbar:false">INPUT_COST_CACHE_HIT = 0.14 / 1_000_000  # <pre class="brush:php;toolbar:false">task2_start_time = time.time()

client = OpenAI(api_key=api_key)

messages = [
    {
        "role": "system",
        "content": """You are an expert in solving Reasoning Problems. Please solve the given problem"""
    },
    {
        "role": "user",
        "content": """In the following question, assuming the given statements to be true, find which of the conclusions among given conclusions is/are definitely true and then give your answers accordingly.
        Statements: H > F ≤ O ≤ L; F ≥ V < D
        Conclusions:
        I. L ≥ V
        II. O > D
        The options are:
        A. Only I is true 
        B. Only II is true
        C. Both I and II are true
        D. Either I or II is true
        E. Neither I nor II is true
        """
    }
]

# Use a compatible encoding (cl100k_base is the best option for new OpenAI models)
encoding = tiktoken.get_encoding("cl100k_base")

# Calculate token counts
input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages)

completion = client.chat.completions.create(
    model="o3-mini-2025-01-31",
    messages=messages
)

output_tokens = len(encoding.encode(completion.choices[0].message.content))

task2_end_time = time.time()


input_cost_per_1k = 0.0011  # Example: <pre class="brush:php;toolbar:false">INPUT_COST_CACHE_HIT = 0.14 / 1_000_000  # <pre class="brush:php;toolbar:false">task3_start_time = time.time()
client = OpenAI(api_key=api_key)
messages = [
        {
		"role": "system",
		"content": """You are a Expert in solving Reasoning Problems. Please solve the given problem"""

	},
 {
		"role": "user",
		"content": """ 
Study the given matrix carefully and select the number from among the given options that can replace the question mark (?) in it.
    __________________
	|  7  | 13	| 174| 
	|  9  | 25	| 104|
	|  11 | 30	| ?  |
    |_____|_____|____|
    The options are: 
   A 335
   B 129
   C 431 
   D 100
   Please mention your approch that you have taken at each step
 """

	}
    ]

# Use a compatible encoding (cl100k_base is the best option for new OpenAI models)
encoding = tiktoken.get_encoding("cl100k_base")

# Calculate token counts
input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages)

completion = client.chat.completions.create(
    model="o3-mini-2025-01-31",
    messages=messages
)

output_tokens = len(encoding.encode(completion.choices[0].message.content))

task3_end_time = time.time()


input_cost_per_1k = 0.0011  # Example: .005 per 1,000 input tokens
output_cost_per_1k = 0.0044  # Example: .015 per 1,000 output tokens

# Calculate cost
input_cost = (input_tokens / 1000) * input_cost_per_1k
output_cost = (output_tokens / 1000) * output_cost_per_1k
total_cost = input_cost + output_cost

# Print results
print(completion.choices[0].message)
print("----------------=Total Time Taken for task 3:----------------- ", task3_end_time - task3_start_time)
print(f"Input Tokens: {input_tokens}, Output Tokens: {output_tokens}")
print(f"Estimated Cost: ${total_cost:.6f}")

# Display result
from IPython.display import Markdown
display(Markdown(completion.choices[0].message.content))

登入後複製

.14 per 1M tokens INPUT_COST_CACHE_MISS = 0.55 / 1_000_000 # .55 per 1M tokens OUTPUT_COST = 2.19 / 1_000_000 # .19 per 1M tokens # Start timing task3_start_time = time.time() # Initialize OpenAI client for DeepSeek API client = OpenAI(api_key=api_key, base_url="https://api.deepseek.com") messages = [ { "role": "system", "content": """You are a Expert in solving Reasoning Problems. Please solve the given problem""" }, { "role": "user", "content": """ Study the given matrix carefully and select the number from among the given options that can replace the question mark (?) in it. __________________ | 7 | 13 | 174| | 9 | 25 | 104| | 11 | 30 | ? | |_____|_____|____| The options are: A 335 B 129 C 431 D 100 Please mention your approch that you have taken at each step """ } ] # Get token count using tiktoken (adjust model name if necessary) encoding = tiktoken.get_encoding("cl100k_base") # Use a compatible tokenizer input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages) # Call DeepSeek API response = client.chat.completions.create( model="deepseek-reasoner", messages=messages, stream=False ) # Get output token count output_tokens = len(encoding.encode(response.choices[0].message.content)) task3_end_time = time.time() total_time_taken = task3_end_time - task3_start_time # Assume cache miss for worst-case pricing (adjust if cache info is available) input_cost = (input_tokens / 1_000_000) * INPUT_COST_CACHE_MISS output_cost = (output_tokens / 1_000_000) * OUTPUT_COST total_cost = input_cost + output_cost # Print results print("Response:", response.choices[0].message.content) print("------------------ Total Time Taken for Task 3: ------------------", total_time_taken) print(f"Input Tokens: {input_tokens}, Output Tokens: {output_tokens}") print(f"Estimated Cost: ${total_cost:.6f}") # Display result from IPython.display import Markdown display(Markdown(response.choices[0].message.content)).005 per 1,000 input tokens output_cost_per_1k = 0.0044 # Example: .015 per 1,000 output tokens # Calculate cost input_cost = (input_tokens / 1000) * input_cost_per_1k output_cost = (output_tokens / 1000) * output_cost_per_1k total_cost = input_cost + output_cost # Print results print(completion.choices[0].message) print("----------------=Total Time Taken for task 2:----------------- ", task2_end_time - task2_start_time) print(f"Input Tokens: {input_tokens}, Output Tokens: {output_tokens}") print(f"Estimated Cost: ${total_cost:.6f}") # Display result from IPython.display import Markdown display(Markdown(completion.choices[0].message.content)).14 per 1M tokens INPUT_COST_CACHE_MISS = 0.55 / 1_000_000 # .55 per 1M tokens OUTPUT_COST = 2.19 / 1_000_000 # .19 per 1M tokens # Start timing task2_start_time = time.time() # Initialize OpenAI client for DeepSeek API client = OpenAI(api_key=api_key, base_url="https://api.deepseek.com") messages = [ {"role": "system", "content": "You are an expert in solving Reasoning Problems. Please solve the given problem."}, {"role": "user", "content": """ In the following question, assuming the given statements to be true, find which of the conclusions among given conclusions is/are definitely true and then give your answers accordingly. Statements: H > F ≤ O ≤ L; F ≥ V < D Conclusions: I. L ≥ V II. O > D The options are: A. Only I is true B. Only II is true C. Both I and II are true D. Either I or II is true E. Neither I nor II is true """} ] # Get token count using tiktoken (adjust model name if necessary) encoding = tiktoken.get_encoding("cl100k_base") # Use a compatible tokenizer input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages) # Call DeepSeek API response = client.chat.completions.create( model="deepseek-reasoner", messages=messages, stream=False ) # Get output token count output_tokens = len(encoding.encode(response.choices[0].message.content)) task2_end_time = time.time() total_time_taken = task2_end_time - task2_start_time # Assume cache miss for worst-case pricing (adjust if cache info is available) input_cost = (input_tokens / 1_000_000) * INPUT_COST_CACHE_MISS output_cost = (output_tokens / 1_000_000) * OUTPUT_COST total_cost = input_cost + output_cost # Print results print("Response:", response.choices[0].message.content) print("------------------ Total Time Taken for Task 2: ------------------", total_time_taken) print(f"Input Tokens: {input_tokens}, Output Tokens: {output_tokens}") print(f"Estimated Cost: ${total_cost:.6f}") # Display result from IPython.display import Markdown display(Markdown(response.choices[0].message.content)).005 per 1,000 input tokens output_cost_per_1k = 0.0044 # Example: .015 per 1,000 output tokens # Calculate cost input_cost = (input_tokens / 1000) * input_cost_per_1k output_cost = (output_tokens / 1000) * output_cost_per_1k total_cost = input_cost + output_cost print(completion.choices[0].message) print("----------------=Total Time Taken for task 1:----------------- ", task1_end_time - task1_start_time) print(f"Input Tokens: {input_tokens}, Output Tokens: {output_tokens}") print(f"Estimated Cost: ${total_cost:.6f}") # Display result from IPython.display import Markdown display(Markdown(completion.choices[0].message.content)).14 per 1M tokens INPUT_COST_CACHE_MISS = 0.55 / 1_000_000 # .55 per 1M tokens OUTPUT_COST = 2.19 / 1_000_000 # .19 per 1M tokens # Start timing task1_start_time = time.time() # Initialize OpenAI client for DeepSeek API client = OpenAI(api_key=api_key, base_url="https://api.deepseek.com") messages = [ { "role": "system", "content": """You are a professional Programmer with a large experience.""" }, { "role": "user", "content": """write a python code for this problem: generate a python code for Tetris game.""" } ] # Get token count using tiktoken (adjust model name if necessary) encoding = tiktoken.get_encoding("cl100k_base") # Use a compatible tokenizer input_tokens = sum(len(encoding.encode(msg["content"])) for msg in messages) # Call DeepSeek API response = client.chat.completions.create( model="deepseek-reasoner", messages=messages, stream=False ) # Get output token count output_tokens = len(encoding.encode(response.choices[0].message.content)) task1_end_time = time.time() total_time_taken = task1_end_time - task1_start_time # Assume cache miss for worst-case pricing (adjust if cache info is available) input_cost = (input_tokens / 1_000_000) * INPUT_COST_CACHE_MISS output_cost = (output_tokens / 1_000_000) * OUTPUT_COST total_cost = input_cost + output_cost # Print results print("Response:", response.choices[0].message.content) print("------------------ Total Time Taken for Task 1: ------------------", total_time_taken) print(f"Input Tokens: {input_tokens}, Output Tokens: {output_tokens}") print(f"Estimated Cost: ${total_cost:.6f}") # Display result from IPython.display import Markdown display(Markdown(response.choices[0].message.content))>您可以在這裡找到DeepSeek-R1的完整響應。

> >輸出令牌成本：

>輸入令牌：28 |輸出令牌：3323 |估計成本：$ 0.0073 O3-Mini可以替換DeepSeek-R1進行邏輯推理嗎？

>代碼輸出

>輸入到O3-Mini API

> O3-Mini >

>您可以在這裡找到O3-Mini的完整響應。

>輸出令牌成本： >輸入令牌：28 |輸出令牌：3235 |估計成本：$ 0.014265

O3-Mini可以替換DeepSeek-R1進行邏輯推理嗎？ >代碼輸出

比較分析

在此任務中，需要模型來生成允許實際遊戲玩法的功能性俄羅斯代碼。如代碼輸出視頻所示，DeepSeek-R1成功地產生了完全有效的實現。相比之下，儘管O3-Mini的代碼看起來良好，但在執行過程中遇到了錯誤。結果，在這種情況下，DeepSeek-R1在這種情況下優於O3 Mini，提供了更可靠和可播放的解決方案。

>得分：

deepSeek-r1：1 | O3-Mini：0

任務2：分析關係不平等

此任務要求模型有效地分析關係不平等而不是依靠基本的分類方法。

>提示：“”在以下問題中，假設給定的陳述為真，在給定的結論中找到了哪個結論是/肯定是正確的，然後相應地給出您的答案。

語句：

h＆gt; f≤o≤l; f≥V＆lt; D

結論：I。 L≥VII。 o＆gt; D

選項是：

a。只有我是true

b。只有i是true

> c。 i和ii都是true

> d。我或ii是true

> e。我和ii都不是真實的。

>輸入到DeepSeek-R1 API>

from openai import OpenAI
from IPython.display import display, Markdown
import time

登入後複製

>輸出令牌成本： >輸入令牌：136 |輸出令牌：352 |估計成本：$ 0.000004

DeepSeek-R1

O3-Mini可以替換DeepSeek-R1進行邏輯推理嗎？

>輸入到O3-Mini API

with open("path_of_api_key") as file:
   openai_api_key = file.read().strip()

登入後複製

>輸出令牌成本：

>輸入令牌：135 |輸出令牌：423 |估計成本：$ 0.002010 O3-Mini >

比較分析 O3-Mini可以替換DeepSeek-R1進行邏輯推理嗎？

O3米尼提供了最有效的解決方案，提供了明顯而準確的響應，在大得多的時間內提供了響應。它在確保邏輯健全的同時保持清晰度，使其非常適合快速推理任務。 DeepSeek-r1雖然同樣正確，但要慢得多且詳細。它詳細的邏輯關係分解增強了解釋性，但對於直接評估可能會感到過分。儘管這兩種模型得出了同樣的結論，但O3-Mini的速度和直接方法使其成為實際使用的更好選擇。

分數： deepseek-r1：0 | O3-Mini：1

>任務3：數學中的邏輯推理

此任務挑戰模型識別數值模式，這可能涉及算術操作，乘法或數學規則的組合。該模型必須採用結構化的方法來有效地推斷出隱藏的邏輯。

提示：>“>仔細研究給定的矩陣，然後從給定選項中選擇可以替換問號（？）的數字。

____________

| 7 | 13 | 174 |

| 9 | 25 | 104 |

| 11 | 30 | ？ |

| _____ | ____ | ___ |

選項是：

a 335

b 129

c 431

d 100

請提及您在每個步驟中採取的方法。

>輸入到DeepSeek-R1 API>

>輸出令牌成本：

>輸入令牌：134 |輸出令牌：274 |估計成本：$ 0.000003

from openai import OpenAI
from IPython.display import display, Markdown
import time

登入後複製

DeepSeek-R1

>輸入到O3-Mini API

> O3-Mini可以替換DeepSeek-R1進行邏輯推理嗎？

>輸出令牌成本： >輸入令牌：134 |輸出令牌：736 |估計成本：$ 0.003386

O3-Mini with open("path_of_api_key") as file: openai_api_key = file.read().strip()

輸出>>>

O3-Mini可以替換DeepSeek-R1進行邏輯推理嗎？

比較分析

O3-Mini可以替換DeepSeek-R1進行邏輯推理嗎？

在這裡，每一行遵循的模式為：

O3-Mini可以替換DeepSeek-R1進行邏輯推理嗎？（第一個數字）^3-（第二個數字）^2 = 3rd number

應用此模式：

第1：7^3 - 13^2 = 343 - 169 = 174

第2行2：9^3 - 25^2 = 729 - 625 = 104

第3行：11^3 - 30^2 = 1331 - 900 = 431

因此，正確的答案是431。

> DeepSeek-R1正確識別並應用了此模式，從而導致正確的答案。它的結構化方法可確保准確性，儘管計算結果需要大大時間。另一方面，O3-Mini無法建立一致的模式。它嘗試了多個操作，例如乘法，加法和指示，但沒有得出確定的答案。這會導致不清楚的響應。總體而言，DeepSeek-R1在邏輯推理和準確性方面優於O3-Mini，而O3米尼由於其不一致和無效的方法而掙扎。

得分：

最終分數：DeepSeek-r1：2 | O3-Mini：1

邏輯推理比較摘要

Task No.	Task Type	Model	Performance	Time Taken (seconds)	Cost
1	Code Generation	DeepSeek-R1	✅ Working Code	606.45	.0073
		o3-mini	❌ Non-working Code	99.73	.014265
2	Alphabetical Reasoning	DeepSeek-R1	✅ Correct	74.28	.000004
		o3-mini	✅ Correct	8.08	.002010
3	Mathematical Reasoning	DeepSeek-R1	✅ Correct	450.53	.000003
		o3-mini	❌ Wrong Answer	12.37	.003386

結論

正如我們在此比較中看到的那樣，DeepSeek-R1和O3-Mini都表現出滿足不同需求的獨特優勢。 DeepSeek-R1在精確驅動的任務中擅長，尤其是在數學推理和復雜的代碼生成中，使其成為需要邏輯深度和正確性的應用程序的有力候選者。但是，一個重要的缺點是其響應時間較慢，部分原因是持續的服務器維護問題影響了其可訪問性。另一方面，O3-Mini提供的響應時間明顯更快，但是其產生不正確結果的趨勢限制了其對高風險推理任務的可靠性。

該分析強調了語言模型中速度和準確性之間的權衡。雖然O3-Mini可能對快速，低風險的應用程序有用，但DeepSeek-R1是解決推理密集型任務的優越選擇，只要解決了其潛伏期問題。隨著AI模型的不斷發展，在性能效率和正確性之間達到平衡將是優化各個領域的AI驅動工作流程的關鍵。

也請閱讀：Openai的O3-Mini可以在編碼中擊敗Claude Sonnet 3.5？

常見問題

> Q1。 DeepSeek-R1和O3-Mini之間的主要區別是什麼？ DeepSeek-R1在數學推理和復雜的代碼生成方面表現出色，非常適合需要邏輯深度和準確性的應用。另一方面，O3-Mini的速度明顯更快，但通常會犧牲準確性，導致偶爾出現不正確的輸出。對於編碼任務，DeepSeek-R1比O3-Mini好嗎？ DeepSeek-r1是編碼和推理密集型任務的更好選擇，因為它具有出色的精度和處理複雜邏輯的能力。雖然O3-Mini提供了更快的響應，但它可能會產生錯誤，從而使其對高風險編程任務的可靠性降低。

Q3。 O3-Mini是否適用於現實世界應用？ O3-Mini最適合低風險，速度依賴的應用程序，例如聊天機器人，休閒文本生成和交互式AI體驗。但是，對於需要高精度的任務，DeepSeek-R1是首選的選項。哪種模型更適合推理和解決問題 - DeepSeek-R1或O3-Mini？ DeepSeek-R1具有出色的邏輯推理和解決問題的能力，使其成為數學計算，編程援助和科學查詢的強大選擇。 O3-Mini在復雜的解決問題的方案中提供了快速但有時不一致的響應。

>

以上是O3-Mini可以替換DeepSeek-R1進行邏輯推理嗎？的詳細內容。更多資訊請關注PHP中文網其他相關文章！