The Microsoft.Extensions.AI.Evaluation.Quality package now includes three new evaluators specifically designed to assess how well AI agents perform in conversational scenarios involving tool use:
We've also introduced a new package, Microsoft.Extensions.AI.Evaluation.NLP, containing evaluators that implement common NLP algorithms for evaluating text similarity:
[alert type="note" heading="Note"]Unlike other evaluators in the Microsoft.Extensions.AI.Evaluation libraries, the NLP evaluators do not require an AI model to perform evaluations. Instead, they use traditional NLP techniques such as text tokenization and n-gram analysis to compute similarity scores.[/alert]
These new evaluators complement the quality and safety-focused evaluators we covered in earlier posts below. Together with custom, domain-specific evaluators that you can create using the Microsoft.Extensions.AI.Evaluation libraries, they provide a robust evaluation toolkit for your .NET AI applications.
The agent quality evaluators require an LLM to perform evaluation. The code example that follows shows how to create an IChatClient
that connects to a model deployed on Azure OpenAI for this. For instructions on how to deploy an OpenAI model in Azure see: Create and deploy an Azure OpenAI in Azure AI Foundry Models resource.
[alert type="note" heading="Note"]We recommend using the GPT-4o or GPT-4.1 series of models when running the below example.
While the Microsoft.Extensions.AI.Evaluation libraries and the underlying core abstractions in Microsoft.Extensions.AI support a variety of different models and LLM providers, the evaluation prompts used within the evaluators in the Microsoft.Extensions.AI.Evaluation.Quality package have been tuned and tested against OpenAI models such as GPT-4o and GPT-4.1. It is possible to use other models by supplying an IChatClient that can connect to your model of choice. However, the performance of those models against the evaluation prompts may vary and may be especially poor for smaller / local models.[/alert]
First, set the required environment variables. For this, you will need the endpoint for your Azure OpenAI resource, and the deployment name for your deployed model. You can copy these values from the Azure portal and paste them in the environment variables below.
SET EVAL_SAMPLE_AZURE_OPENAI_ENDPOINT=https://<your azure openai resource name>.openai.azure.com/
SET EVAL_SAMPLE_AZURE_OPENAI_MODEL=<your model deployment name (e.g., gpt-4o)>
The example uses DefaultAzureCredential for authentication. You can sign in to Azure using developer tooling such as Visual Studio or the Azure CLI.
Next, let's create a new test project to demonstrate the new evaluators. You can use any of the following approaches:
dotnet new mstest -n EvaluationTests
cd EvaluationTests
After creating the project, add the necessary NuGet packages:
dotnet add package Azure.AI.OpenAI
dotnet add package Azure.Identity
dotnet add package Microsoft.Extensions.AI.Evaluation
dotnet add package Microsoft.Extensions.AI.Evaluation.Quality
dotnet add package Microsoft.Extensions.AI.Evaluation.NLP --prerelease
dotnet add package Microsoft.Extensions.AI.Evaluation.Reporting
dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease
Next, copy the following code into the project (inside Test1.cs). The example demonstrates how to run agent quality and NLP evaluators via two separate unit tests defined in the same test class.
using Azure.AI.OpenAI;
using Azure.Identity;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.AI.Evaluation;
using Microsoft.Extensions.AI.Evaluation.NLP;
using Microsoft.Extensions.AI.Evaluation.Quality;
using Microsoft.Extensions.AI.Evaluation.Reporting;
using Microsoft.Extensions.AI.Evaluation.Reporting.Storage;
using DescriptionAttribute = System.ComponentModel.DescriptionAttribute;
namespace EvaluationTests;
#pragma warning disable AIEVAL001 // The agent quality evaluators used below are currently marked as [Experimental].
[TestClass]
public class Test1
{
private static readonly ReportingConfiguration s_agentQualityConfig = CreateAgentQualityReportingConfiguration();
private static readonly ReportingConfiguration s_nlpConfig = CreateNLPReportingConfiguration();
[TestMethod]
public async Task EvaluateAgentQuality()
{
// This example demonstrates how to run agent quality evaluators (ToolCallAccuracyEvaluator,
// TaskAdherenceEvaluator, and IntentResolutionEvaluator) that assess how well an AI agent performs tasks
// involving tool use and conversational interactions.
await using ScenarioRun scenarioRun = await s_agentQualityConfig.CreateScenarioRunAsync("Agent Quality");
// Get a conversation that simulates a customer service agent using tools to assist a customer.
(List<ChatMessage> messages, ChatResponse response, List<AITool> toolDefinitions) =
await GetCustomerServiceConversationAsync(chatClient: scenarioRun.ChatConfiguration!.ChatClient);
// The agent quality evaluators require tool definitions to assess tool-related behaviors.
List<EvaluationContext> additionalContext =
[
new ToolCallAccuracyEvaluatorContext(toolDefinitions),
new TaskAdherenceEvaluatorContext(toolDefinitions),
new IntentResolutionEvaluatorContext(toolDefinitions)
];
// Run the agent quality evaluators against the response.
EvaluationResult result = await scenarioRun.EvaluateAsync(messages, response, additionalContext);
// Retrieve one of the metrics (example: Intent Resolution).
NumericMetric intentResolution = result.Get<NumericMetric>(IntentResolutionEvaluator.IntentResolutionMetricName);
// By default, a Value < 4 is interpreted as a failing score for the Intent Resolution metric.
Assert.IsFalse(intentResolution.Interpretation!.Failed);
// Results are also persisted to disk under the storageRootPath specified below. You can use the dotnet aieval
// command line tool to generate an HTML report and view these results.
}
[TestMethod]
public async Task EvaluateNLPMetrics()
{
// This example demonstrates how to run NLP (Natural Language Processing) evaluators (BLEUEvaluator,
// GLEUEvaluator and F1Evaluator) that measure text similarity between a model's output and supplied reference
// text.
await using ScenarioRun scenarioRun = await s_nlpConfig.CreateScenarioRunAsync("NLP");
// Set up the text similarity evaluation inputs. Response represents an example model output, and
// referenceResponses represent a set of ideal responses that the model's output will be compared against.
const string Response =
"Paris is the capital of France. It's famous for the Eiffel Tower, Louvre Museum, and rich cultural heritage";
List<string> referenceResponses =
[
"Paris is the capital of France. It is renowned for the Eiffel Tower, Louvre Museum, and cultural traditions.",
"Paris, the capital of France, is famous for its landmarks like the Eiffel Tower and vibrant culture.",
"The capital of France is Paris, known for its history, art, and iconic landmarks like the Eiffel Tower."
];
// The NLP evaluators require one or more reference responses to compare against the model's output.
List<EvaluationContext> additionalContext =
[
new BLEUEvaluatorContext(referenceResponses),
new GLEUEvaluatorContext(referenceResponses),
new F1EvaluatorContext(groundTruth: referenceResponses.First())
];
// Run the NLP evaluators.
EvaluationResult result = await scenarioRun.EvaluateAsync(Response, additionalContext);
// Retrieve one of the metrics (example: F1).
NumericMetric f1 = result.Get<NumericMetric>(F1Evaluator.F1MetricName);
// By default, a Value < 0.5 is interpreted as a failing score for the F1 metric.
Assert.IsFalse(f1.Interpretation!.Failed);
// Results are also persisted to disk under the storageRootPath specified below. You can use the dotnet aieval
// command line tool to generate an HTML report and view these results.
}
private static ReportingConfiguration CreateAgentQualityReportingConfiguration()
{
// Create an IChatClient to interact with a model deployed on Azure OpenAI.
string endpoint = Environment.GetEnvironmentVariable("EVAL_SAMPLE_AZURE_OPENAI_ENDPOINT")!;
string model = Environment.GetEnvironmentVariable("EVAL_SAMPLE_AZURE_OPENAI_MODEL")!;
var client = new AzureOpenAIClient(new Uri(endpoint), new DefaultAzureCredential());
IChatClient chatClient = client.GetChatClient(deploymentName: model).AsIChatClient();
// Enable function invocation support on the chat client. This allows the chat client to invoke AIFunctions
// (tools) defined in the conversation.
chatClient = chatClient.AsBuilder().UseFunctionInvocation().Build();
// Create a ReportingConfiguration for the agent quality evaluation scenario.
return DiskBasedReportingConfiguration.Create(
storageRootPath: "./eval-results", // The evaluation results will be persisted to disk under this folder.
evaluators: [new ToolCallAccuracyEvaluator(), new TaskAdherenceEvaluator(), new IntentResolutionEvaluator()],
chatConfiguration: new ChatConfiguration(chatClient),
enableResponseCaching: true);
// Since response caching is enabled above, all LLM responses produced via the chatClient above will also be
// cached under the storageRootPath so long as the inputs being evaluated stay unchanged, and so long as the
// cache entries do not expire (cache expiry is set at 14 days by default).
}
private static ReportingConfiguration CreateNLPReportingConfiguration()
{
// Create a ReportingConfiguration for the NLP evaluation scenario.
// Note that the NLP evaluators do not require an LLM to perform the evaluation. Instead, they use traditional
// NLP techniques (text tokenization, n-gram analysis, etc.) to compute text similarity scores.
return DiskBasedReportingConfiguration.Create(
storageRootPath: "./eval-results", // The evaluation results will be persisted to disk under this folder.
evaluators: [new BLEUEvaluator(), new GLEUEvaluator(), new F1Evaluator()]);
}
private static async Task<(List<ChatMessage> messages, ChatResponse response, List<AITool> toolDefinitions)>
GetCustomerServiceConversationAsync(IChatClient chatClient)
{
// Get a conversation that simulates a customer service agent using tools (such as GetOrders() and
// GetOrderStatus() below) to assist a customer.
List<ChatMessage> messages =
[
new ChatMessage(ChatRole.System, "You are a helpful customer service agent. Use tools to assist customers."),
new ChatMessage(ChatRole.User, "Could you tell me the status of the last 2 orders on my account #888?")
];
List<AITool> toolDefinitions = [AIFunctionFactory.Create(GetOrders), AIFunctionFactory.Create(GetOrderStatus)];
var options = new ChatOptions() { Tools = toolDefinitions, Temperature = 0.0f };
ChatResponse response = await chatClient.GetResponseAsync(messages, options);
return (messages, response, toolDefinitions);
}
[Description("Gets the orders for a customer")]
private static IReadOnlyList<CustomerOrder> GetOrders(
[Description("The customer account number")] int accountNumber)
{
return accountNumber switch
{
888 => [new CustomerOrder(123), new CustomerOrder(124)],
_ => throw new InvalidOperationException($"Account number {accountNumber} is not valid.")
};
}
[Description("Gets the delivery status of an order")]
private static CustomerOrderStatus GetOrderStatus(
[Description("The order ID to check")] int orderId)
{
return orderId switch
{
123 => new CustomerOrderStatus(orderId, "shipped", DateTime.Now.AddDays(1)),
124 => new CustomerOrderStatus(orderId, "delayed", DateTime.Now.AddDays(10)),
_ => throw new InvalidOperationException($"Order with ID {orderId} not found.")
};
}
private record CustomerOrder(int OrderId);
private record CustomerOrderStatus(int OrderId, string Status, DateTime ExpectedDelivery);
}
Next, let’s run the above unit tests. You can either use Visual Studio or Visual Studio Code’s Test Explorer or run dotnet test
from the command line.
After running the tests, you can generate an HTML report containing results for both the "Agent Quality" and "NLP" scenarios in the example above using the dotnet aieval
tool.
First, install the tool locally in your project:
dotnet tool install Microsoft.Extensions.AI.Evaluation.Console --create-manifest-if-needed
Then generate and open the report:
dotnet aieval report -p <path to 'eval-results' folder under the build output directory for the above project> -o .\report.html --open
The --open
flag will automatically open the generated report in your default browser, allowing you to explore the evaluation results interactively. Here’s a peek at the generated report – this screenshot shows the details revealed when you click on the "Intent Resolution" metric under the "Agent Quality" scenario.
For more comprehensive examples that demonstrate various API concepts, functionality, best practices and common usage patterns for the Microsoft.Extensions.AI.Evaluation libraries, explore the API Usage Examples in the dotnet/ai-samples repository. Documentation and tutorials for the evaluation libraries are also available under - The Microsoft.Extensions.AI.Evaluation libraries.
We encourage you to try out these evaluators in your AI applications and share your feedback. If you encounter any issues or have suggestions for improvements, please report them on GitHub. Your feedback helps us continue to enhance the evaluation libraries and build better tools for the .NET AI development community.
Happy evaluating!
]]>You may remember that the semiannual updates used to be called the Spring and Fall releases. For example, we had the 2017 Fall Creators Update and the 2018 Spring Update. Why the name change?
It was during an all-hands meeting that a senior executive asked if the organization had any unconscious biases. One of my colleagues raised his hand. He grew up in the Southern Hemisphere, where the seasons are opposite from those in the Northern Hemisphere. He pointed out that naming the updates Spring and Fall shows a Northern Hemisphere bias and is not inclusive of our customers in the Southern Hemisphere.
The names of the semiannual releases were changed the next day to be hemisphere-neutral.
]]>private
sections of the class, adding getter/setter methods, and updating all references to respect this new access level.
GitHub Copilot now supports Next Edit Suggestions (or NES for short) to predict the next edits to come. NES in GitHub Copilot helps you stay in flow by not only helping predict where you’ll need to make updates, but also what you’ll need to change next.
std::string
type, NES predicts and suggests updates across all applicable areas near the cursor. NES replaces calls to fgets
with calls to std::getline
and replaces atoi
with the C++ std::stoi
, which has better error handling.
You can now automatically inject the dependency scanning task into any pipeline run targeting your default branch. This is a quick way to ensure that your production code (and any code being merged into your production branch) are evaluated for open-source dependency vulnerabilities.
You'll need to have the Advanced Security: manage settings permission to make changes to your repository's Advanced Security enablement. Navigate to a specific repository's settings page: Project settings > Repositories > Select your repository.
If you're using the standalone products, you first need Code Security enabled. Then, navigate to Options and confirm your selection of Dependency alerts default setup.
If you're using the bundled Advanced Security, enable the checkbox to Scan default branch for vulnerable dependencies.
Upon the next execution of a pipeline run targeting your repository's default branch, the Advanced Security dependency scanning task will be injected near the end of your pipeline. Dependency scanning completes evaluation of your dependencies and any associated vulnerabilities within a few minutes. For repositories where you may not have consistent CI/CD running, we recommend scheduled pipeline runs.
If the task is already in your pipeline or you've set up your pipelines to skip the dependency scanning task via the DependencyScanning.Skip: true
environment variable, the injected task will be skipped. The environment variable is a great option if there are certain pipelines you don't want to include in your scanning surface area. Alternatively, if there are certain pipeline jobs you wish to skip automated scanning in, you can also set the pipeline variable dependencyScanningInjectionEnabled
to false.
Upon successful execution of the task, results are uploaded to Advanced Security and available in the Repos > Advanced Security tab for developers to fix any findings.
You can also use this to easily set up pull request annotations for dependency scanning. If you have a build validation policy configured for your repository, dependency scanning will also automatically inject into any pull requests that target your default branch. Annotations for new findings appear directly on your pull request after you've scanned your default branch at least once, while any findings that exist in both branches will show up in the Advanced Security tab as well.
Give this feature a try! Our team is also working on more experiences to smooth out the enablement process across Advanced Security. Have any feedback? Please share that with us directly or on Developer Community.
Learn more about Advanced Security and dependency scanning.
]]>input_fidelity
parameter in the image edits API lets you control how closely the model preserves the style and features of the original image. This is useful for editing photos (e.g., facial features, avatars), maintaining brand identity, and realistic product imagery.o3-deep-research
), tightly integrated with Bing Search for authoritative, up-to-date results. Read the full announcement.
tool_resources
parameter for custom tool resource overrides. ChangelogAIProjectClient
now requires a Project endpoint; connection classes consolidated and renamed; deprecated UploadFileRequest
in favor of UploadFile
under Datasets; OpenAI chat client now supports authenticated use in projects.The previous NuGet updater used a hybrid solution that relied heavily on manual XML parsing and string replacement operations written in Ruby. While this approach worked for basic scenarios, it struggled with the complexity and nuances of modern .NET projects. The new updater takes a completely different approach by using .NET's native tooling directly.
Instead of trying to reverse-engineer what NuGet and MSBuild do, the new updater leverages actual .NET tooling:
This shift from manual XML manipulation to using the actual .NET toolchain means the updater now behaves exactly like the tools developers use every day.
The improvements in the new updater are dramatic. The test suite that previously took 26 minutes now completes in just 9 minutes—a 65% reduction in runtime. But speed is only part of the story. The success rate for updates has jumped from 82% to 94%, meaning significantly fewer failed updates that require manual intervention.
These improvements work together to deliver a faster, more reliable experience. When Dependabot runs on your repository, it spends less time processing updates and succeeds more often—reducing both the wait time and the manual intervention needed to keep your dependencies current.
One of the most significant improvements is how the updater discovers and analyzes dependencies. Previously, the Ruby-based parser would attempt to parse project files as XML and guess what the final dependency graph would look like. This approach was fragile and missed complex scenarios.
The new updater uses MSBuild's project evaluation engine to properly understand your project's true dependency structure. This means it can now handle complex scenarios that previously caused problems.
For example, the old parser missed conditional package references like this:
<ItemGroup Condition="'$(TargetFramework)' == 'net8.0'">
<PackageReference Include="Microsoft.Extensions.Hosting" Version="8.0.0" />
</ItemGroup>
With the new MSBuild-based approach, the updater can handle
Directory.Build.props
and Directory.Build.targets
that modify dependenciesOne of the most impressive features of the new updater is its sophisticated dependency resolution engine. Instead of updating packages in isolation, it now performs comprehensive conflict resolution. This includes two key capabilities:
When you have a vulnerable transitive dependency that can't be directly updated, the updater will now automatically find the best way to resolve the vulnerability. Let's look at a real scenario where your app depends on a package that has a vulnerable transitive dependency:
YourApp
└── PackageA v1.0.0
└── TransitivePackage v2.0.0 (CVE-2024-12345)
The new updater follows a smart resolution strategy:
First, it checks if PackageA
has a newer version available that depends on a non-vulnerable version of TransitivePackage
. If PackageA
v2.0.0 depends on TransitivePackage
v3.0.0 (which fixes the vulnerability), Dependabot will update PackageA
to v2.0.0.
If no updated version of PackageA
is available, Dependabot will add a direct dependency on a non-vulnerable version of TransitivePackage
to your project. This leverages NuGet's 'direct dependency wins' rule, where direct dependencies take precedence over transitive ones:
<PackageReference Include="PackageA" Version="1.0.0" />
<PackageReference Include="TransitivePackage" Version="3.0.0" />
With this approach, even though PackageA
v1.0.0 still references TransitivePackage
v2.0.0, NuGet will use v3.0.0 because it's a direct dependency of your project. This ensures your application uses the secure version without waiting for PackageA
to be updated.
The updater also identifies and updates related packages to avoid version conflicts. If updating one package in a family (like Microsoft.Extensions.*
packages) would create version mismatches with related packages, the updater automatically updates the entire family to compatible versions.
This intelligent conflict resolution dramatically reduces the number of failed updates and eliminates the manual work of resolving package conflicts.
The new updater now properly respects global.json
files, a feature that was inconsistently supported in the previous version. If your project specifies a particular .NET SDK version, the updater will install the exact SDK version specified in your global.json
. This ensures that the updater evaluates dependency updates using the same .NET SDK version that your development team and CI/CD pipelines use, eliminating a common source of inconsistencies.
This improvement complements Dependabot's recently added capability to update .NET SDK versions in global.json files. While the SDK updater keeps your .NET SDK version current with security patches and improvements, the NuGet updater respects whatever SDK version you've chosen—whether manually specified or automatically updated by Dependabot. This seamless integration means you get the best of both worlds: automated SDK updates when you want them, and consistent package dependency resolution that honors your SDK choices.
Central Package Management (CPM) has become increasingly popular in .NET projects for managing package versions across multiple projects. The previous updater had limited support for CPM scenarios, often requiring manual intervention.
The new updater provides comprehensive CPM support. It automatically detects Directory.Packages.props
files, properly updates versions in centralized version files, supports package overrides in individual projects, and handles transitive dependencies managed through CPM. Whether you're using CPM for version management, security vulnerability management, or both, the new updater handles these scenarios seamlessly.
The previous updater struggled with private NuGet feeds, especially those with non-standard authentication or API implementations. The new updater uses NuGet's official client libraries. This means it automatically supports all NuGet v2 and v3 feeds, including nuget.org, Azure Artifacts, and GitHub Packages. It also:
If your .NET tools can access a feed, Dependabot can too.
If you're using Dependabot for .NET projects, you should notice these improvements immediately. Faster updates mean dependency scans and update generation happen more quickly. More successful updates result in fewer failed updates that require manual intervention. Better accuracy ensures updates that properly respect your project's configuration and constraints. And when updates do fail, you'll get clearer errors with actionable error messages.
You don't need to change anything in your dependabot.yml
configuration—you automatically get these improvements for all .NET projects.
This rewrite represents more than just performance improvements—it's a foundation for future enhancements. By building on .NET's native tooling, the Dependabot team will be able to add support for new .NET features as they're released, improve integration with .NET developer workflows, extend capabilities to handle more complex enterprise scenarios, and provide better diagnostics and debugging information.
The new architecture also makes it easier for the community to contribute improvements and fixes, as we rewrote the codebase in C# and leverage the same tools and libraries that .NET developers use every day. This means that developers can make contributions using familiar .NET development practices, making it easier for the community to help shape the future of Dependabot's NuGet support.
The new NuGet updater is already live and processing updates for .NET repositories across GitHub. If you haven't enabled Dependabot for your .NET projects yet, now is a great time to start. Here's a minimal configuration to get you started:
version: 2
updates:
- package-ecosystem: "nuget"
directory: "/"
schedule:
interval: "weekly"
And if you're already using Dependabot, you should already be seeing the improvements. Faster updates, fewer failures, and clearer error messages—all without changing a single line of configuration.
The rewrite demonstrates how modern dependency management should work: fast, accurate, and transparent. By leveraging the same tools that developers use every day, Dependabot can now provide an experience that feels native to the .NET ecosystem while delivering the automation and security benefits that make dependency management less of a chore.
]]>nNumberOfBytesToRead
parameter to ReadFile
is a 32-bit unsigned integer, which limits the number of bytes that could be read at once to 4GB. What if you need to read more than 4GB?
The ReadFile
function cannot read more than 4GB of data at a time. At the time the function was originally written, all Win32 platforms were 32-bit, so reading more than 4GB of data into memory was impossible because the address space didn't have room for a buffer that large.
When Windows was expanded from 32-bit to 64-bit, the byte count was not expanded. I don't know the reason for certain, but it was probably a combination of (1) not wanting to change the ABI more than necessary, so that it would be easier to port 32-bit device drivers to 64-bit, and (2) having no practical demand for reading that much data in a single call.
You can work around the problem by writing a helper function that breaks the large read into chunks of less than 4GB each.
But reading 4GB of data into memory seems awfully unusual. Do you really need all of it in memory at once? Maybe you can just read the parts you need as you need them. Or you can use a memory-mapped file to make this on-demand reading transparent. (Though at a cost of having to deal with in-page exceptions if the read cannot be satisfied.)
]]>