Skip to content
\n

Now results are

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
MethodMeanErrorStdDev
MasterBranch241.94 ns4.832 ns5.564 ns
PrBranch24.65 ns0.305 ns0.270 ns
\n

But now the feedback I got is that BenchmarkTest.precomputed might be stored in CPU-registry or some other fast cache. Then the benchmark results would be incorrect because PrBranch benchmark is essentially \"cheating\". In reality, many processes/threads/etc. compete to store data in the fast cache as it is a scare resource.

\n

Do you agree with the feedback I got? Could anyone give me a tip how to improve the benchmark a bit?

\n

Thanks you!

","upvoteCount":1,"answerCount":1,"acceptedAnswer":{"@type":"Answer","text":"

Take 3 (I told you it's difficult!)

\n
Code\n

\n

public class BenchmarkTest\n{\n    private static readonly decimal[] precomputedCache = new decimal[]\n    {\n        1m,\n        0.1m,\n        0.01m,\n        0.001m,\n        0.0001m,\n        0.00001m,\n        0.000001m,\n        0.0000001m,\n        0.00000001m,\n        0.000000001m,\n        0.0000000001m,\n        0.00000000001m,\n        0.000000000001m,\n        0.0000000000001m,\n        0.00000000000001m,\n        0.000000000000001m,\n        0.0000000000000001m,\n        0.00000000000000001m,\n        0.000000000000000001m,\n        0.0000000000000000001m,\n    };\n\n    [Benchmark]\n    public decimal Calculate()\n    {\n        decimal r = 0;\n\n        for (int decimals = 0; decimals < 5; decimals++)\n        {\n            r += (decimal) Math.Pow(10, -decimals);\n        }\n\n        return r;\n    }\n\n    [Benchmark]\n    public decimal Cache()\n    {\n        decimal r = 0;\n\n        for (int decimals = 0; decimals < 5; decimals++)\n        {\n            r += precomputedCache[decimals];\n        }\n\n        return r;\n    }\n\n    [Benchmark]\n    public decimal Ram()\n    {\n        decimal r = 0;\n\n        for (int decimals = 0; decimals < 5; decimals++)\n        {\n            CacheHelper.FlushCache();\n            r += precomputedCache[decimals];\n        }\n\n        return r;\n    }\n\n    [Benchmark]\n    public void FlushCacheOverhead()\n    {\n        for (int decimals = 0; decimals < 5; decimals++)\n        {\n            CacheHelper.FlushCache();\n        }\n    }\n}\n\nstatic class CacheHelper\n{\n\n    private static readonly byte[] cacheFlusher1 = GetCacheFlusher();\n    private static readonly byte[] cacheFlusher2 = GetCacheFlusher();\n\n    private static byte[] GetCacheFlusher()\n    {\n        Processor.GetPerCoreCacheSizes(out var l1, out var l2, out var l3);\n        var totalCacheSize = l1 + l2 + l3;\n        return new byte[totalCacheSize];\n    }\n\n    // You can probably get the actual cache line size from the processor information, but I just used the common 64 size because I'm lazy.\n    const int cacheLineSize = 64;\n    // Write to field to prevent dead code elimination.\n    public static byte holder;\n    private static bool flusher1;\n\n    public static void FlushCache()\n    {\n        flusher1 = !flusher1;\n        var array = flusher1 ? cacheFlusher1 : cacheFlusher2;\n        // Touch every cache line to flush the cache.\n        for (int i = 0, max = array.Length; i < max; i += cacheLineSize)\n        {\n            holder = array[i];\n        }\n    }\n}\n\nclass Processor\n{\n    [DllImport(\"kernel32.dll\")]\n    public static extern int GetCurrentThreadId();\n\n    //[DllImport(\"kernel32.dll\")]\n    //public static extern int GetCurrentProcessorNumber();\n\n    [StructLayout(LayoutKind.Sequential, Pack = 4)]\n    private struct GROUP_AFFINITY\n    {\n        public UIntPtr Mask;\n\n        [MarshalAs(UnmanagedType.U2)]\n        public ushort Group;\n\n        [MarshalAs(UnmanagedType.ByValArray, SizeConst = 3, ArraySubType = UnmanagedType.U2)]\n        public ushort[] Reserved;\n    }\n\n    [DllImport(\"kernel32\", SetLastError = true)]\n    private static extern Boolean SetThreadGroupAffinity(IntPtr hThread, ref GROUP_AFFINITY GroupAffinity, ref GROUP_AFFINITY PreviousGroupAffinity);\n\n    [StructLayout(LayoutKind.Sequential)]\n    public struct PROCESSORCORE\n    {\n        public byte Flags;\n    };\n\n    [StructLayout(LayoutKind.Sequential)]\n    public struct NUMANODE\n    {\n        public uint NodeNumber;\n    }\n\n    public enum PROCESSOR_CACHE_TYPE\n    {\n        CacheUnified,\n        CacheInstruction,\n        CacheData,\n        CacheTrace\n    }\n\n    [StructLayout(LayoutKind.Sequential)]\n    public struct CACHE_DESCRIPTOR\n    {\n        public byte Level;\n        public byte Associativity;\n        public ushort LineSize;\n        public uint Size;\n        public PROCESSOR_CACHE_TYPE Type;\n    }\n\n    [StructLayout(LayoutKind.Explicit)]\n    public struct SYSTEM_LOGICAL_PROCESSOR_INFORMATION_UNION\n    {\n        [FieldOffset(0)]\n        public PROCESSORCORE ProcessorCore;\n        [FieldOffset(0)]\n        public NUMANODE NumaNode;\n        [FieldOffset(0)]\n        public CACHE_DESCRIPTOR Cache;\n        [FieldOffset(0)]\n        private UInt64 Reserved1;\n        [FieldOffset(8)]\n        private UInt64 Reserved2;\n    }\n\n    public enum LOGICAL_PROCESSOR_RELATIONSHIP\n    {\n        RelationProcessorCore,\n        RelationNumaNode,\n        RelationCache,\n        RelationProcessorPackage,\n        RelationGroup,\n        RelationAll = 0xffff\n    }\n\n    public struct SYSTEM_LOGICAL_PROCESSOR_INFORMATION\n    {\n#pragma warning disable 0649\n        public UIntPtr ProcessorMask;\n        public LOGICAL_PROCESSOR_RELATIONSHIP Relationship;\n        public SYSTEM_LOGICAL_PROCESSOR_INFORMATION_UNION ProcessorInformation;\n#pragma warning restore 0649\n    }\n\n    [DllImport(@\"kernel32.dll\", SetLastError = true)]\n    public static extern bool GetLogicalProcessorInformation(IntPtr Buffer, ref uint ReturnLength);\n\n    private const int ERROR_INSUFFICIENT_BUFFER = 122;\n\n    private static SYSTEM_LOGICAL_PROCESSOR_INFORMATION[] _logicalProcessorInformation = null;\n\n    public static SYSTEM_LOGICAL_PROCESSOR_INFORMATION[] LogicalProcessorInformation\n    {\n        get\n        {\n            if (_logicalProcessorInformation != null)\n                return _logicalProcessorInformation;\n\n            uint ReturnLength = 0;\n\n            GetLogicalProcessorInformation(IntPtr.Zero, ref ReturnLength);\n\n            if (Marshal.GetLastWin32Error() == ERROR_INSUFFICIENT_BUFFER)\n            {\n                IntPtr Ptr = Marshal.AllocHGlobal((int) ReturnLength);\n                try\n                {\n                    if (GetLogicalProcessorInformation(Ptr, ref ReturnLength))\n                    {\n                        int size = Marshal.SizeOf(typeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION));\n                        int len = (int) ReturnLength / size;\n                        _logicalProcessorInformation = new SYSTEM_LOGICAL_PROCESSOR_INFORMATION[len];\n                        IntPtr Item = Ptr;\n\n                        for (int i = 0; i < len; i++)\n                        {\n                            _logicalProcessorInformation[i] = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION) Marshal.PtrToStructure(Item, typeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION));\n                            Item += size;\n                        }\n\n                        return _logicalProcessorInformation;\n                    }\n                }\n                finally\n                {\n                    Marshal.FreeHGlobal(Ptr);\n                }\n            }\n            return null;\n        }\n    }\n\n    public static void GetPerCoreCacheSizes(out Int64 L1, out Int64 L2, out Int64 L3)\n    {\n        L1 = 0;\n        L2 = 0;\n        L3 = 0;\n\n        var info = Processor.LogicalProcessorInformation;\n        foreach (var entry in info)\n        {\n            if (entry.Relationship != Processor.LOGICAL_PROCESSOR_RELATIONSHIP.RelationCache)\n                continue;\n            Int64 mask = (Int64) entry.ProcessorMask;\n            if ((mask & (Int64) 1) == 0)\n                continue;\n            var cache = entry.ProcessorInformation.Cache;\n            switch (cache.Level)\n            {\n                case 1: L1 = L1 + cache.Size; break;\n                case 2: L2 = L2 + cache.Size; break;\n                case 3: L3 = L3 + cache.Size; break;\n                default:\n                    break;\n            }\n        }\n    }\n}
\n

\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
MethodMeanErrorStdDev
Calculate457.24 ns2.536 ns2.372 ns
Cache55.43 ns0.410 ns0.364 ns
Ram5,026,508.40 ns18,058.436 ns14,098.839 ns
FlushCacheOverhead5,019,991.08 ns26,890.458 ns20,994.301 ns
\n

You can see I flushed the cpu cache before each access. This of course has its own overhead, so I measured that overhead in its own benchmark. Subtracting that overhead we get these results:

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
MethodMeanErrorStdDev
Calculate457.24 ns2.536 ns2.372 ns
Cache55.43 ns0.410 ns0.364 ns
Ram6,517.32 ns??
\n

It looks like this time ram is ~117x slower than cache. That's much more believable to me considering the common 10-100x.

","upvoteCount":0,"url":"https://github.com/dotnet/BenchmarkDotNet/discussions/2513#discussioncomment-8259443"}}}

How to take into account memory access #2513

Answered by timcassell
MartyIX asked this question in Q&A
Discussion options

You must be logged in to vote

Take 3 (I told you it's difficult!)

Code

public class BenchmarkTest
{
    private static readonly decimal[] precomputedCache = new decimal[]
    {
        1m,
        0.1m,
        0.01m,
        0.001m,
        0.0001m,
        0.00001m,
        0.000001m,
        0.0000001m,
        0.00000001m,
        0.000000001m,
        0.0000000001m,
        0.00000000001m,
        0.000000000001m,
        0.0000000000001m,
        0.00000000000001m,
        0.000000000000001m,
        0.0000000000000001m,
        0.00000000000000001m,
        0.000000000000000001m,
        0.0000000000000000001m,
    };

    [Benchmark]
    public decimal Calculate()
    {
        decimal r = 0;

        for (int 

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@timcassell

This comment has been hidden.

@timcassell
Comment options

Answer selected by MartyIX
@MartyIX
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants