ChatGPT-o1 vs. ChatGPT-4o – Part 2

Share post:

In the first blog article of the two-part ChatGPT-o1 vs ChatGPT-4o series, we reported on the advanced techno­logy in o1 and looked at the diffe­rences in estimating a sales price and the number of tennis balls that fit in a Boeing. Now it’s time to get down to business and test ChatGPT-o1 and its prede­cessor with four more tasks from a wide variety of subject areas:

Task 3: Trans­la­tion from English into Yoruba

We have noticed that the o1 model has achieved better perfor­mance on several language bench­marks. In this instance, we will be testing the model on the Yoruba language, which presents unique challenges due to its tonal nature.

Yoruba is a tonal language where the pitch or tone used when prono­un­cing a word can comple­tely change its meaning, making it diffi­cult for begin­ners to learn.

Additio­nally, words with similar spellings can have entirely diffe­rent meanings depen­ding on the tone. Yoruba is one of the major languages spoken in West Africa, especi­ally in Nigeria, and its lingu­i­stic comple­xity offers a rich ground for evalua­ting the model’s capacity to handle tonal languages, where subtle shifts in pronun­cia­tion can lead to drastic changes in meaning. As a native speaker, I was thrilled to test the model .
As a native speaker, I was thrilled to test the models in terms of their trans­la­tion capabi­li­ties.

For example, the word ‘apẹ̀rẹ̀’ could mean diffe­rent things based on its tone, unders­coring the importance of under­stan­ding tonal distinc­tions in the language:
  • apẹ̀rẹ̀ – Basket
  • apẹẹrẹ – Example


Similarly, the word “koko” can have several meanings:
  • Kókò – Kokojam
  • Kókó – point (the most important part of a discus­sion)
  • Kòkò – pot
  • Koóko – Grass


Below we test o1 and GPT-4o for the word ‘koko’ with some wordplay.

Result ChatGPT-o1 for task 3

Result ChatGPT-4o for task 3

Compa­rison of results and facts:

ChatGPT-o1 did an impres­sive job of trans­la­ting the state­ment correctly, especi­ally for keywords such as “basket”, which was trans­lated as apẹrẹ and “cocoyams” as kókó.

In compa­rison, GPT-4o trans­lated “basket” as apo, which refers to a bag, and “cocoyam” was wrongly trans­lated as isu ekó.

My conclu­sion at this point: ChatGPT-o1 performs better.

Task 4: Writing poetry with a fixed number of words

For the next task, the instruc­tion is to formu­late a poem on the subject of “work-life balance” with exactly 100 words. As we don’t give the algorithms any leeway with the number of words, this is a parti­cu­larly tough nut to crack.

Result ChatGPT-o1 for task 4

Result ChatGPT-4o for task 4

Using a word counter to analyze the results, we find that both failed to reach 100 words. ChatGPT-o1’s poem has 118 words, which is defini­tely too long, interes­t­ingly GPT-4o created a poem with 94 words, which is too short.

Our conclu­sion: GPT-4o was only six words off the mark and performs better in this task than its successor

Task 5: Our math question

Our first bonus question is aimed at the mathe­ma­tical abili­ties of both algorithms. So that it doesn’t get too easy, we have chosen a task from the suppo­sedly 15 most diffi­cult (digital) SAT questions (for those interested, there is this solution path).

Result ChatGPT-o1 for task 5

Result ChatGPT-4o for task 5

Result compa­rison: In this example, we can attest to the key strength of o1: The correct answer was 2.25 or 9/4, as correctly estimated by o1-preview, while GPT-4o gave an answer of 1.125 or 9/8. Even though GPT-4o started the calcu­la­tion and equation on a good note, the model got confused while trying to estimate q2 when dynamic pressure when velocity was 1.5υ.

Conclu­sion: Since o1 only came to the correct result, this task goes to o1.

Task 6: Our programming task

Our second bonus question is intended to reveal how well the algorithms support the (supposed) everyday life of a data scien­tist (or whoever programs in Python), for example. The task is as follows: “Write a program in Pyhton that adds two integers, without arith­metic opera­tion” (if that’s not a classic data science chall­enge from everyday life 😀 ).

Result ChatGPT-o1 for task 6

Result ChatGPT-4o for task 6

Compa­rison of results and facts:

Good news: the code of both algorithms ran in the respec­tive tests. However, there was one but: o1’s solution was more robust and accurate.
While GPT-4o could handle some cases with negative numbers correctly, the solution resulted in an infinite loop for certain inputs, e.g., when adding -1 + 2. This issue arose due to how the carry bits propa­gate indefi­ni­tely when dealing with negative numbers in Python.
As shown in the code output, it took o1 14ms to run the three cases, while GPT-4o resulted in an infinite loop at negative example.

Conclu­sion: The quality of the o1 result is better, so this task goes to o1.

Our compa­ra­tive conclu­sion

In my opinion, ChatGPT-o1 solved four of the six tasks better or at least more humanly compre­hen­sible. The prede­cessor ChatGPT-4o was only able to do better in the poem task with a fixed number of words. There was a tie when it came to estimating the number of tennis balls.

The advan­tages and resul­ting possi­bi­li­ties of o1 were parti­cu­larly impres­sive in the case of complex trans­la­tions from possibly less common languages, such as Yoruba in our example, when a deeper under­stan­ding of the context is important. o1 also demons­trated its capabi­li­ties in the programming and math tasks.

The advan­tages of o1 are of course offset by the longer running time that you (currently) still have to accept. Perhaps the golden mean is a combi­na­tion of both models, with initial prompts with GPT-4o followed by more specific ones for o1. I am excited to see where the journey for o1 will take us.
Picture of Mark Willoughby

Mark Willoughby

Data Scien­tist

Projektanfrage

Vielen Dank für Ihr Interesse an den Leistungen von m²hycon. Wir freuen uns sehr, von Ihrem Projekt zu erfahren und legen großen Wert darauf, Sie ausführlich zu beraten.

Von Ihnen im Formular eingegebene Daten speichern und verwenden wir ausschließlich zur Bearbeitung Ihrer Anfrage. Ihre Daten werden verschlüsselt übermittelt. Wir verarbeiten Ihre personenbezogenen Daten im Einklang mit unserer Datenschutzerklärung.

Project request

Thank you for your interest in m²hycon’s services. We look forward to hearing about your project and attach great importance to providing you with detailed advice.

We store and use the data you enter in the form exclusively for processing your request. Your data is transmitted in encrypted form. We process your personal data in accordance with our privacy policy.