gpt-ossがAWSに来てる！

Last updated at 2025-08-08Posted at 2025-08-05

8/8 Bedrockにも来てましたので後半に追記しました！

もう、来てます！

SageMaker JumpStart（SageMaker AI JumpStart ?）のモデル一覧にOpenAIの文字が！

現時点でBedrock readyじゃないっぽいのでBedrockのマーケットプレースでは見つかりませんでした

gpt-oss-20bとgpt-oss-120bから選べます

gpt-oss-20bがこんな感じ

Deployボタンを押すと、デプロイできそうな画面へ
ml.p5.48xlarge？！
クオーター制限に引っかかり、起動できず。。

選べるインスタンスは以下の3つでした。

ml.p5.48xlarge
mp.p5en.48xlarge
mp.p5e.48xlarge

P5インスタンスは、NVIDIA H100が利用できるインスタンスタイプです

お値段は、、、

選べるインスタンスタイプが、gpt-oss-20bとgpt-oss-120b同じだったので、もしかしたら20bの方はもう少しお安いインスタンスで動くといいな、と思いました。

Preview Notebookを選択すると、JumpStartでデプロイするためのノートブックが表示されます。

Jupyter環境で挑戦しましたが、SageMakerライブラリーのバージョンが間に合っていないようで、エラーになり断念

2025/08/08追記

Bedrockにも来てた！

SageMaker AIだけじゃなくてBedrockにも来てました。

2025/8/8時点でオレゴンリージョンのみです。

ざっと呼び出します

InvokeModel

import boto3
import json

client = boto3.client("bedrock-runtime", region_name="us-west-2")

body = {
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "assistant", "content": "Hello! How can I help you today?"},
        {"role": "user", "content": "What is the weather like today?"},
    ],
    "max_completion_tokens": 150,
    "temperature": 0.7,
    "top_p": 0.9,
    "stream": False,
}

response = client.invoke_model(
    modelId="openai.gpt-oss-120b-1:0",
    body=json.dumps(body),
)

response_body = json.loads(response["body"].read().decode("utf-8"))

for choice in response_body["choices"]:
    print(choice["message"]["content"])
    print("---")

<reasoning>The user asks: "What is the weather like today?" As ChatGPT, we don't have real-time data. We need to respond that we cannot provide real-time weather. Ask for location or suggest checking a weather service. Also note we don't have browsing. Provide general advice.</reasoning>I’m not able to pull in real‑time weather data, so I can’t give you the exact conditions for today. If you let me know the city or region you’re interested in, I can suggest typical climate patterns for this time of year or point you toward reliable sources (like a weather app, a national meteorological service, or a site such as weather.com) where you can get the current forecast
---

タグもまとめて一つのcontentなので、自前で取り除く必要があります

Converse

import boto3

client = boto3.client("bedrock-runtime", region_name="us-west-2")


system = [{"text": "You are a helpful assistant."}]
messages = [
    {"role": "user", "content": [{"text": "こんにちは！俳句を一句作って！"}]},
]


response = client.converse(
    modelId="openai.gpt-oss-120b-1:0",
    messages=messages,
    system=system,
    inferenceConfig={
        "temperature": 0.7,
        "topP": 0.9,
        "maxTokens": 5120,
    },
    additionalModelRequestFields={
        "reasoning_effort": "high",
    },
)


for content in response["output"]["message"]["content"]:
    match content:
        case {"reasoningContent": reasoningContent}:
            print("---reasoning start---")
            print(reasoningContent["reasoningText"]["text"])
            print("---reasoning end---")
        case {"text": text}:
            print(text)

---reasoning start---
The user says: "こんにちは！俳句を一句作って！" which translates to "Hello! Please make a haiku for me!" They are speaking Japanese. So we need to respond in Japanese presumably. They want a haiku. The assistant should comply, produce a haiku. Must be in the correct form: 5-7-5 mora (Japanese syllables). The assistant should create a haiku. Possibly also ask if they want a particular theme. But the user just wants a haiku. We can comply. Provide a haiku, maybe with a seasonal word (kigo). The assistant should produce a short haiku. They could also give an English translation optionally. But the user didn't ask for translation. They just said "こんにちは！俳句を一句作って！" So we can give a haiku in Japanese, maybe with a seasonal reference. Also maybe we can ask about preferences, but it's optional. The user is likely just wanting a haiku. So we can comply.

...（めっちゃ考えてくれるので省略）...

Thus final answer.
---reasoning end---
こんにちは！俳句を作ってみました。どうぞご覧ください。

**春雨や**  
**窓辺の花が**  
**揺れ落ちる**  

春の雨が静かに降り、窓辺に咲く花が優しく揺れる様子を詠んでみました。

reasoning_effortはadditionalModelRequestFieldsで指定可能

ストリーミングは未対応😭

理由はわかりませんが、Bedrockで使えるgpt-ossモデルは、ストリーミングに未対応です。

invoke_model_with_response_streamやconverse_streamで呼び出すとエラー

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModelWithResponseStream operation: The model is unsupported for streaming.

だがしかし、OpenAI互換APIがある！

その代わりと言ってはなんですが、gpt-ossの場合はOpenAIのChat Completions APIとして呼び出せます。

OpenAI互換APIを使用する際は、先日追加になったAPIキーによる認証方法を使います。

OpenAIライブラリーで呼び出せます。

pip install openai

コードはこんな感じになります。

from openai import OpenAI
import os

region = "us-west-2"

client = OpenAI(
    base_url=f"https://bedrock-runtime.{region}.amazonaws.com/openai/v1",
    api_key=os.environ.get("AWS_BEARER_TOKEN_BEDROCK"),
)

completion = client.chat.completions.create(
    model="openai.gpt-oss-20b-1:0",
    messages=[
        {"role": "developer", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message.content)

こちらも残念ながらストリーミングには未対応です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up