Vinai Kopp

Magento Expert, Developer & Trainer

  • Pre-loading For Scalability

    June 9, 2014

    Background

    Part of my work consists of “performance reviews” (that is, figuring out why Magento sites are slow or not scaling well and how to fix them), or giving trainings on performance related development best practices.

    There are many aspects, but one issue that I encounter in hundreds of variations is what I want to focus on in this post.

    Consider the following bit of code in a recently viewed products template:

    <?php foreach ($_products as $_item): ?>
    <?php $manufacturer = $_item->getId(); ?>
    <li class="item">
        <a href="<?php echo $_item->getProductUrl() ?>" class="product-image">
            <img src="<?php echo $this->helper('catalog/image')->init($_item, 'small_image')->resize(230) ?>"
                 width="230"
                 height="230" />
        </a>
        <?php $catType = Mage::getSingleton('catalog/product')->load($_item->getProductId()); ?>
        <span class="category_type">
            <?php echo $catType->getAttributeText('category_type') ?>
        </span>
        <div class="product-holder">
            <h2 class="product-name">
                <a href="<?php echo $this->getProductUrl($_item) ?>">
                    <?php echo $this->helper('catalog/output')->productAttribute($_item, $_item->getName() , 'name') ?>
                </a>
            </h2>
            <?php echo $this->getPriceHtml($_item, true) ?>
        </div>
    </li>
    <?php endforeach; ?>

    The issue, of course, is the line

    <?php $catType = Mage::getSingleton('catalog/product')->load($_item->getProductId()); ?>

    If you are thinking I’m going to rant about calling load() in a template you are off track.
    There are slightly more efficient ways to load individual product attributes, but still that is not the point I’m trying to make.
    In regards to performance that doesn’t matter.

    The real issue is that the line containing the load() is wrapped within a foreach loop.

    The issue

    What it comes down to is that loading things in a loop does not scale.

    Consider the amount of database queries this bit of code will execute.

    1. There has to be a collection of items to loop over in the first place. In this case its a product collection, which is loaded using two queries.
    2. Then the load() statement loads each product, and since its an EAV entity, that once again means two queries per product.

    So what is the total amount of queries?
    The point is, we don’t know. Its impossible to say because it depends on the number of products in the collection.
    If there is a single product in the collection, this code block will cause 4 queries to be executed.
    If there are 100 products in the collection, there would be 202 queries being executed (2 for the collection and 2 for each product: 2 + 100*2).

    During performance related development trainings I like to do a small exercise.
    It consists of rendering a category tree using a recursive method.
    Each method call loads the children of the parent category and renders them.

    Note: there are methods to load a category tree at once in Magento. The point of this exercise is not how to load categories, the point is to demonstrate the difference between recursive loading and pre-fetching.

    protected function _renderCategory(Mage_Catalog_Model_Category $category)
    {
        $output = str_pad('', $category->getLevel() * 3,
            ' ', STR_PAD_LEFT) . $category->getName() . PHP_EOL;
        foreach ($this->_getChildren($category) as $child) {
            $output .= $this->_renderCategory(
                $category->setData($child)
            );
        }
     
        return $output;
    }
     
    protected function _getChildren(Mage_Catalog_Model_Category $category)
    {
        ...
        // load child category using a simple SQL query
        // and return array of rows
    }

    Using this code on a dev instance with the 27 categories of the magento sample data, no problem. The page loads in 500ms or so, depending on the system setup.

    How many categories are there in a medium sized real shop? Maybe 500? Maybe 1000? Some have more, some have less. The exact number doesn’t matter.
    The next step during the training is hacking a little script to create 1000 randomly nested categories.
    And guess what? With 1000 categories the page now loads in 60 seconds or more. If you try to use the Magento profiler, you will have to assign a s***load of memory that the page loads at all!

    The takeaway is obvious: the number of queries should always be the same. They should be independent of the number of records being loaded.

    In the training the next step is to build a variation of the exercise that pre-loads all categories at the beginning, and then using that to render the tree.
    The result is striking. It easily scales to 5000 categories loading in a couple of seconds or faster (again, the exact time depends on the development instance setup).

    The solution

    Preloading enables us to have the same number of queries regardless of data size.
    Sometimes it can be tricky to figure out how to preload the data set that is required without a recursive structure. It might not be possible using a single query. Maybe 2 or 4 queries are required.
    For example, maybe the root category for the current store needs to be determined first, before the category branch below can be fetched.

    Having multiple queries is perfectly okay. The important thing is that always the same amount of queries will be executed, regardless of the number of records that are being fetched.
    This fixed number of queries then can be optimized until the site performs according to the owners expectations.

    Different incarnations or the problem

    The issue I’ve described above comes in many flavors and disguises.
    When I’m analysing the performance bottlenecks of a site that doesn’t scales well, the product listing pages almost always suffer from this issue.
    Typical instances where this issue of recursive loading surfaces are:

    • Loading of product attributes for the filter navigation
    • Configurable products loading their child products
    • Loading of configurable attributes for each product
    • Product gallery images for additional thumbnails
    • Stock and availability for each product

    The underlying issue always is the same. Usually the site performs okay on a development instance with a small catalog, but it turns out to be much to slow in production.

    The developer who builds the site generally believe himself innocent, since he could have been using the correct Magento core classes and methods.
    For example:

    $product->getTypeInstance(true)->getConfigurableAttributes()

    This method causes massive loading, it was never intended to be used in a loop. However, the recursive nature of loading data is nicely hidden, unless you dig deep into the core.

    (Note: Thanks to Tim’s question in the comments below, here is an example how to pre-fetch configurable attributes. This implementation was for one concrete project. But since each project has different requirements, so please treat the gist as an example, not as a ready solution.)

    Hiding the issues

    The most common approach is to solve scalability problems is to cache aggressively. But this is not a real solution. Adding caching can be compared to sweeping dirt under a rug to hide it.

    Usually the caching does seems to help, but mostly the result will be unsatisfactory: some requests will be cache misses and still take very long to render, or even time-out.

    One extreme symptom of using caching to hide scalability issues on a application level can be that clearing the cache causes the site to crash, because of all the requests generate a too high load on the server.

    The way out

    Always the path out of this scalability mess is profiling to find the place causing the issue and implementing pre-fetching.

    This isn’t rocket science. This isn’t new. But the problems happens easily, especially in Magento where the underlying implementation of some functionality may be hidden deep in the class hierarchy.

    There are many helpful tools available:

    I like using the Magento profiler, either Fabrizio Branca’s extension or some project specific customization, usually using via an event observer logging the profiling data in a format that can easily be analyzed.

    Another invaluable too is xdebug profiling with kcachegrind (or qcachegrind) as a viewer. PHPStorm also can visualize cachgrind data (but personally I prefer qcachegrind).

    Magento ECG provides a great code sniffer rule for finding LSD and data load methods within loops, as long as they are called directly in the loop construct. Thanks to Alan MacGregor for bringing this up.

    New Relic is great, too. Others love Xhprof.

    But whatever tool you use:
    The solution to scalability issues due to recursive loading always is pre-loading.

    comments powered by Disqus