Skip to content

Serialization error with Gemini Batch #664

Open
@paras-genmo

Description

@paras-genmo

I'm getting an error here when using Gemini batch and am struggling to debug it. It seems to do with how curator is serializing the Gemini batch request to GCS?

Failed to import data. Please check 'Prepare input' section of the batch predictions documentation. For Claude models, see https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude-batch#prepare_input Invalid value: Error while reading data, error message: Failed to parse JSON: No active field found.; ParsedString returned false; Could not parse value; Could not parse value; Could not parse value; Could not parse value; Could not parse value; Could not parse value; Parser terminated before end of string File: requests_0.jsonl at [1:1]

This is a representative curator pass:

class HarmfulContentDetector(curator.LLM):
    def prompt(self, row) -> List[dict]:
        prompt = ("Does this image contain harmful content? Reply only Yes or No.")
        image_part = {
            "fileData": {
                "mimeType": "image/jpeg",
                "fileUri":  row["image_uri"],
            }
        }
        return [{"role": "user", "content": [prompt, image_part]}]

    def parse(self, row, resp):
        return {
            "image_uri": row["image_uri"],
            "boxed": (resp or "").strip() or "ERROR",
        }

I'm passing direct gs:// URIs to this function.

Metadata

Metadata

Assignees

Labels

curator-batchRelated batch request processorduplicateThis issue or pull request already exists

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions