Program runs slower when compiled with Visual Studio 2005 (in Release with optimization)

Hey,

I have a performance problem with the new Visual Studio 2005. My code (which is a genetic algorithm) is 3 times slower when compiled under Visual Studio 2005 compared to when I was compiling it under Visual Studio 2003.

With Visual Studio 2003, it used to run in 3.0639 seconds, and now it runs in 9.0031 seconds. I run both code in Release. I tried every optimization option, and adding the /D_SECURE_SCL=0 compiler command to the project options with only minor differences.

Is there any other way to get back to the speed Visual Studio 2003 It is a bit absurd to upgrade to a new compiler and get lesser performances.

Thanks a lot!!!

Antoine Atallah


Answer this question

Program runs slower when compiled with Visual Studio 2005 (in Release with optimization)

  • Sys Manager

    Hello,

       I think Gorm is right about the new... Looks like someone messed it up in 2005, the code in the CRT looks like:

    VISUAL STUDIO 2005:
    void * operator new( size_t cb )
    {
        void *res;

        for (;;) {

            //  allocate memory block
            res = _heap_alloc(cb);

            //  if successful allocation, return pointer to memory

            if (res)
                break;

            //  call installed new handler
            if (!_callnewh(cb))
                break;

            //  new handler was successful -- try to allocate again
        }

        RTCCALLBACK(_RTC_Allocate_hook, (res, cb, 0));

        return res;
    }

    VISUAL STUDIO 2003:
    void * operator new( size_t cb )
    {
            void *res = _nh_malloc( cb, 1 );

            RTCCALLBACK(_RTC_Allocate_hook, (res, cb, 0));

            return res;
    }

    Can anyone explain why there is a loop in the New operator Looks to me like a humongous waste of processing waste of time!

    Thanks a lot!

    Antoine


  • Raju_Sreenivasan

    Hey Gorm,

       If you look at the code of malloc, it is actually worse than the code of new in 2005... there are more operations and it ends up with the same _heap_alloc in the end... basically, there is no performace gain by choosing one or the other.

       Antoine

  • pradeepp

    Could you please post the exact compiler options you were using You can get those from the build log.

    Thanks,
      Ayman Shoukry
      VC++ Team

  • Dan Mikkelsen

    Hi Ayman,

    The code is proprietary and I cannot give it freely on the web. However, here is a part of the code which exhibits the same problem. Basically, it is a linked list and a bunch of nodes. Inserting 1000000 nodes in the linked list is faster when compiled under Visual Studio 2003 in Release (takes 501 milliseconds on my pc) than when compiled under Visual Studio 2005 in Release (takes 753 milliseconds on my pc).

    Here is the sample code (stripped for the web):

    #include "stdafx.h"
    #include "ALAbstractList.h"
    #include "ALIntNode.h"
    #include <time.h>

    int _tmain(int argc, _TCHAR* argv[])
    {
     ALAbstractList alalist;

     clock_t time = clock();
     for (int i = 0; i < 1000000; i++)
     {
      alalist.InsertAtEnd(new ALIntNode(i));
     }
     printf("Time Spent (in milliseconds) = %d\n", (clock() - time));

     return 0;
    }


    ------------------------------

    /**
     * \author Antoine Atallah
     * \version 1.0
     * \date 04/25/2005
     * \brief Defines a generic Abstract List node which can contain integers
     *
    */
    #pragma once
    #include "alconstants.h"
    #include "alabstractnode.h"

    /** \class ALIntNode
     \brief Abstract Node for Integers
     */
    class ALIntNode : public ALAbstractNode
    {
    public:
     ALIntNode(int data) {i_data = data;};
     virtual ~ALIntNode(void);

     int GetData() {return i_data;};

    private:
     int i_data; ///< Data contained by the node!
    };

    --------------------------------

    /**
     * \author Antoine Atallah, Aidee Carriere
     * \version 1.0
     * \date 04/21/2005
     * \brief The abstract node is the base class for anything which can be inserted in an
     * abstract list (see ALAbstractList.h).
     *
    */

    #pragma once
    #include "stdafx.h"

    /*! \class ALAbstractNode
     \brief This is a class describing an abstract node (purely virtual class)
    */
    class ALAbstractNode
    {
    //friend class ALAbstractList;
    public:
     /**
      * \brief Constructor of a node: sets the type of the node.
      * \param Type is the type of the node
      * \param MaxStackSize is the maximum length of the stack of Next/Prev nodes (useful to put a link in many lists)
      */
     ALAbstractNode() {palan_nextNode = NULL; palan_prevNode = NULL;};

     /**
      * \brief Virtual Destructor of a node
      */
     virtual ~ALAbstractNode(void) {};

     /**
      * \brief Sets the next node to the current node
      * \param nextNode is a pointer to the new next node (can be NULL to clear it)
      */
     inline void SetNextNode(ALAbstractNode * nextNode) {palan_nextNode = nextNode;};

     /**
      * \brief Gets the next node to the current node
      * \return Returns a pointer to the next node
      */
     inline ALAbstractNode * GetNextNode() {return palan_nextNode;};

     /**
      * \brief Sets the previous node to the current node
      * \param prevNode is a pointer to the new previous node (can be NULL to clear it)
      */
     inline void SetPrevNode(ALAbstractNode * prevNode) {palan_prevNode = prevNode;};

     /**
      * \brief Gets the previous node to the current node
      * \return Returns a pointer to the previous node
      */
     inline ALAbstractNode * GetPrevNode() {return palan_prevNode;};

    private:
     ALAbstractNode * palan_nextNode; ///< Pointer to the next node in the list
     ALAbstractNode * palan_prevNode; ///< Pointer to the previous node in the list
    };

    --------------------------------

    /** \file
     * \author Antoine Atallah, Aidee Carriere, Jean-Francois Cote
     * \version 1.0; Antoine and Aidee
     *  \version 1.1; Antoine
     * \date 04/21/2005, 05/03/2005
     *
    */

    #pragma once
    #include "stdafx.h"
    #include "ALAbstractNode.h"
    #include "ALConstants.h"

    #define AL_END_OF_LIST   65534  ///< Constant used to insert at the end of a list
    #define AL_BEGINNING_OF_LIST 0   ///< Constant used to insert at the beginning of a list

    // Forward Declaration
    class ALAbstractNode;

    /*! \class ALAbstractList
     \brief This is a class describing the Abstract List system
    */
    class ALAbstractList
    {
    public:
     // Class Constructor
     ALAbstractList();
     virtual ~ALAbstractList(void);

     // Basic list functions for data insertion
     void InsertAtEnd(ALAbstractNode * nodeToInsert);
     void InsertAfter(ALAbstractNode * node, ALAbstractNode * nodeToInsert);

    protected:
     ALAbstractNode * palan_firstNode;  ///< Pointer to the first node of the list
     ALAbstractNode * palan_lastNode;  ///< Pointer to the last node of the list
     int i_listSize;       ///< Size of the list
    };

    ---------------------------

    #include "stdafx.h"
    #include "alabstractlist.h"

    /*!
     * \brief
     * The constructor of the list.
     *
     * \param Mutexed
     * Indicates if the list must be thread safe or not. If this variable is set to TRUE
     * the list is thread safe, otherwise, it is not.
     */
    ALAbstractList::ALAbstractList()
    {
     // initializes the list items
     palan_firstNode = NULL;
     palan_lastNode = NULL;
     i_listSize = 0;
    }

    /*!
     * \brief
     * Destructor of the abstract list
     */
    ALAbstractList::~ALAbstractList(void)
    {
     // Removed for code example simplicity...
    }

    /*!
     * \brief
     * Inserts a node after the selected node
     *
     * \param node is the selected node
     * \param nodeToInsert is the node to insert after the selected node
     * \remarks This function is NOT thread safe
     */
    void ALAbstractList::InsertAfter(ALAbstractNode * node, ALAbstractNode * nodeToInsert)
    {
     ALAbstractNode * palan_tempNode = node;

     // Increases the size of the list, since insertion is always guaranteed
     i_listSize++;

     // If the list is empty... Insert at the beginning
     if ((palan_firstNode == NULL) && (palan_lastNode == NULL))
     {
      palan_firstNode = nodeToInsert;
      palan_lastNode = nodeToInsert;

      nodeToInsert->SetNextNode(NULL);
      nodeToInsert->SetPrevNode(NULL);
     }
     //we insert at the beginning of the list
     else if (palan_tempNode == NULL)
     {
      nodeToInsert->SetNextNode(palan_firstNode);
      nodeToInsert->SetPrevNode(NULL);
      palan_firstNode->SetPrevNode(nodeToInsert);
      palan_firstNode = nodeToInsert;
     } // if (palan_tempNode == NULL)

     // We insert at the end of the list
     else if (palan_tempNode->GetNextNode() == NULL)
     {
      palan_lastNode->SetNextNode(nodeToInsert);
      nodeToInsert->SetNextNode(NULL);
      nodeToInsert->SetPrevNode(palan_lastNode);
      palan_lastNode = nodeToInsert;
     } // if (palan_tempNode->GetNextNode() == NULL)
     else
     {
      // Sets the link of the node!
      nodeToInsert->SetPrevNode(palan_tempNode);
      nodeToInsert->SetNextNode(palan_tempNode->GetNextNode());
      palan_tempNode->SetNextNode(nodeToInsert);
      nodeToInsert->GetNextNode()->SetPrevNode(nodeToInsert);
     }
    }

    /*!
     * \brief
     * Inserts a node after the selected node
     *
     * \param nodeToInsert is the node to insert after the selected node
     * \remarks This function is NOT thread safe
     */
    void ALAbstractList::InsertAtEnd(ALAbstractNode * nodeToInsert)
    {
     InsertAfter(this->palan_lastNode, nodeToInsert);
    }



  • gshaf

    Actually, the For(;;) is not that bad, you are right, as it proably acts as a branch in the assambly code. The real problem would be the if(ret) which adds an extra instruction, plus a branch error possibility.

    Overall, the new "new" is 33% slower than the previous "new"

    The same thing can be observed in the code of memcpy, where there is an if checking for the validity of the parameters in the new 2005 versions. Since a bunch of those IFs were added here and there in the library, it makes it (overall) 3 times slower for algorithm and other computations. Even pure C is slower in some cases (for example, the memcpy and memmove...).

    I hope there would have been some #define to speed up stuff...

    Antoine

  • Mohican

    Hi Antoine,
      I am actually interested in investigating such slow performance. Could you please post a sample exhibiting the problem I will be more than happy to look at what is exactly happening.

    Thanks,
      Ayman Shoukry
      VC++ Team


  • Tom Archer - MSFT

    That 'infinite' loop is the correct way to implement operator new, although it "should" throw std::bad_alloc() if the new handler fails (although that may be in _callnewh(), or this is the nothrow version )

    Why does the existance of a loop mean "humongous waste of processing" It doesn't, at all.

  • amitabhcy

    I didn't mean that I disliked the new new, I just wanted to stress that you might want to malloc more at the time. One way of doing this could be overriding new to malloc more seldom.

  • Aneo

    Antonie,

    Can you send us the command line parameters you are using to build the app in VS2003 and VS 2005 If you are building in the IDE you can just send us the build logs.

    When you say that "new" is now 33% slower you are ignoring the cost of the following line in new

    res = _heap_alloc(cb);

    This function eventually calls into HeapAlloc which maintains the heap. Heap allocation is orders of magnitude costlier than the additional check in new.

    mem* functions should not have any additional checks. If you are seeing slowdown in using these functions please send us a repro case.

    Other CRT functions do have the additional parameter tests as a result of Secure CRT work.

    Thanks,
    Sridhar Madhugiri
    Software Developer
    VisualC++

    ---
    This posting is provided "AS IS" with no warranties, and confers no rights.


  • Jacob Grass

    Hi Antoine,
     We are currently investigating the repro case. I will keep you updated once the analysis of the issue is done.

    Thanks,
      Ayman Shoukry
      VC++ Team

  • roy-roy

    Hi.
    Interesting.

    That is a lot of "new"s.
    Looks to me that malloc is taking its share of the ticks here, is this the case for your other program, too

    My immediate guess would be that there is more code behind every malloc for some (security) reason.

    Does it help allocating more at a time
    Maybe overriding the new operator could be a possibility for your scenario If you know how much memory you need, I guess one million malloc calls (and possible more for your leaf objects) are unneccessary brutal

    Hope this helps

  • dreadjr

    Hello.

    > If you look at the code of malloc, it is actually worse than the code of new in 2005... there are more operations and it ends up with the same _heap_alloc in the end... basically, there is no performace gain by choosing one or the other.

    Sorry. I was a bit unclear. By "malloc" I mean memory allocation on heap (e.g. _heap_alloc). I don't believe there is any real difference between allocating memory with "new", "malloc" or even "_heap_alloc"... 

    What I ment to say was that you can, with some effort. allocate all the memory you need in one or a few chunks, for then to give it out a bit at the time in a metod you call "operator new".

    Excuse me if I don't always make sense.


  • OoLee

    I'm interested in knowing what you end up finding out about this.  I'm curious as to whether it is some compiler options that make the difference. 
  • Shahid Mahmood

    Hello,

       Sure, here are the compiler options for 2005:

    Compiler: /O2 /Ob1 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "_WINDOWS" /D "_VC80_UPGRADE=0x0710" /D "_MBCS" /FD /EHa /MT /GS- /arch:SSE2 /fp:fast /GR- /Fo"Release\\" /Fd"Release\vc80.pdb" /W2 /nologo /c /Wp64 /TP /wd4996
    /errorReport:prompt

    Linker: /OUT:"Release/Core.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST
    /MANIFESTFILE:"Release\Core.exe.intermediate.manifest" /SUBSYSTEM:CONSOLE
    /HEAP:10485760,10485760 /STACK:10485760,10485760
    /LARGEADDRESSAWARE:NO /TSAWARE:NO /OPT:REF /OPT:ICF
    /OPT:NOWIN98 /LTCG /MACHINE:X86 /FIXED:No /ERRORREPORT:PROMPT Ws2_32.lib  kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib
       
       For 2003:

    Compiler: /O2 /Ot /GT /G7 /GA /D "WIN32" /D "_WINDOWS"
    /D "_MBCS" /FD /EHsc /MT /GS /arch:SSE2 /Fo"Release/"
    /Fd"Release/vc70.pdb" /W2 /nologo /c /Wp64 /TP

    Linker: /OUT:"Release/Core.exe" /INCREMENTAL:NO /NOLOGO
    /SUBSYSTEM:CONSOLE /HEAP:10485760,10485760 /STACK:10485760,10485760
    /OPT:REF /OPT:ICF /MACHINE:X86 /FIXED:No Ws2_32.lib  kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib

       Thanks a lot!

       Antoine

  • Program runs slower when compiled with Visual Studio 2005 (in Release with optimization)