Home‎ > ‎

Optimization Tips

  • Don't second-guess the compiler
    Most optimization tricks of the past are no longer necessary with modern compilers. These include unrolling loops and combining methods to avoid function calls. And if you're using a managed language deployed to platforms with JIT compilers then you're actually hurting performance by denying better optimizations that the JIT can do automatically. See "What the compiler saw" below for more

  • Let the profiler tell you what to optimize
    Run your code in a profiler to see which functions your program is spending the most time in and which loops have the highest mileage, then optimize those. You may think you know which functions should be the ones slowing down your program, but the profiler will find hot spots you never realized existed

  • Pave the trodden areas
    Before making views and indexes to go with your database schema, let development on the application get underway and periodically review what kinds of queries your team are using, then create views and indexes to optimize those. Overzealous, unguided indexing makes INSERTS and UPDATES slower without benefit

  • Localization in moderation
    The distance to your data should be inversely proportional to its popularity. If you need to know the US$-to-Euro exchange rate a thousand times a second for your international e-commerce site, then it needs to be on the local machine--in RAM--and you'll need either a caching or replication mechanism to keep it fresh. But if you only need to know it once at midnight when a batch job runs to spit out invoices, then you can spend as long as you like on an HTTP request to a server halfway across the world; it would be pointless to waste engineering time caching that data

  • Every piece of data in its place
    Not everything belongs in the database, or XML, or the cloud. When you pick a place for data to live it should stay there, which means giving thought to how that data will be used and what mechanisms will localize it. Pools of frequently updated data belong in a database, using caches for localization. While sessile data (infrequently changing, such as configuration, lists of country codes, etc.) belong in static files and should be localized through replication, not caching

  • Beware the lazy-fetch, my son
    ORM? They often come with a deferred-loading, or "lazy fetch" feature that can either help performance or kill it. It helps if you seldom require related data--you don't waste time fetching what you don't need--but it kills performance if you need everything and the ORM is generating a zillion queries while you churn through a big loop. When you need a lot of data from many tables you should drop back to classic SQL and give up the ORM--at least for those areas of your program that'd hurt from it

  • Caching is hard, let's go shopping!
    Before you build your own caching mechanisms, try to find an off-the-shelf solution first, because if you build your own you should expect to spend most of your debugging time on it. Memcache and Squid are examples of off-the-shelf caches you can shop for (and they're free). Use Memcache if you frequently fetch random entities from a database, and Squid if you use a REST architecture

  • Know what caching is already being done before adding more
    And furthermore, you may be enjoying the benefits of a cache without even knowing it. Some common built-in caches are: database connection pooling, ORM entity caching, and file I/O. Sometimes that cache isn't on your machine, such as when a database caches the results of a popular query. Duplicate caches can hurt performance by consuming RAM that might be better spent. Ask yourself if it's worth caching that data a second time on the client, or if it's better to sacrifice bandwidth for memory and clarity

  • Use stages and pipelines to maximize cache hits
    Some of the best performance gains come from breaking CPU-intensive work into stages so that the instruction and data caches on the CPU die itself are used to the greatest efficiency. EG: if you're hashing, signing and compressing lots of records then do all of the hashing first, then do all of the signing, and then all of the compressing

  • Big dataset? Prefer Data-Oriented design instead of Object-Oriented
    Moore's law has outpaced improvements in Memory speed, and even though memory is faster today than before, the mismatch of growth means that RAM has been getting slower when measured in terms of CPU cycles: In 1980 it only took 1 CPU cycle to read or write to RAM, but today it can be over 400 cycles. OOP, with its emphasis on encapsulation of data with code, is not optimal when you're processing very large amounts of data

  • Thread the high-latency stuff
    Network and disk I/O is high-latency, which means the CPU is spending a lot of time doing nothing while it waits for data to arrive. Spin these tasks off into a separate thread and begin doing your CPU work as data trickles in. The BackgroundWorker class is one way to do this easily, providing a ReportProgress event to marshal data back into the main thread. Then there are BeginAsync()-style interfaces that do the threading transparently, holding the data until your program comes back to fetch the results

  • struct vs. class? Remember values vs. behaviors
    C# has both structs (pass by value) and classes (pass by reference). The performance penalty for using a class is a lot of heap allocation, dereferencing and garbage collection going on when you might not need it. The downside of structs is that they don't support null references, which can be potentially wasteful (like when you do myStruct[] foo = new myStruct[1000]). But in general, if your type is a wrapper for a scalar value then use struct. While if it's going to be used for its behavior, implement an interface, or become part of a hierarchy, then use class. (But do not retroactively change all your value classes to structs or vice-versa unless you know what you're doing, or you'll create bugs from the subtle differences in how they work.)

  • JIT? Use small, composable functions
    When deploying to a platform with a JIT compiler, you'll get better runtime performance if you have a small number of general-purpose functions than if you had lots and lots of specific methods. This is because the JIT only gets called just before a method is to be used for the first time, and the compiled native code is cached for each subsequent call. Amortized over the program's runtime, it's faster to use Group(Sort(Filter(data))) instead of GroupSortAndFilter(data), especially if you'd also need SortAndFilter(), GroupAndFilter(), SortAndGroup(), GroupAndSort() and so-on

  • LINQ? Return expressions, not results
    If you're using LINQ, try to have functions return an expression tree rather than an array of results. When you compose functions (like above), the evaluator will be able to produce a more efficient execution plan. EG:

    IEnumerable<DataType> Filter (IEnumerable<DataType> data)
    {
       return from datum in data
              where datum.foo == 12345
              select datum;
    }

    This function will return an expression tree that, when evaluated, will filter the input. If Sort() and Group() are written the same way, evaluation of the combined tree will be deferred until you need to loop through the results. If you're using LINQ-to-SQL then your composition of multiple functions will get converted into a single T-SQL query

  • LINQ Caveat: Don't enumerate twice by mistake
    The above can come with a hidden tax, however, which is that two enumerations of an expression tree will execute the same query twice. If you're going to enumerate through the results more than once then stash the final results in an array with .ToArray() first

  • Garbage collector? Stop wasting instances
    If your platform has a garbage collector, avoid creating excessive object instances and promote them to private, static members instead. For example, you keep recreating instances of SHA256CryptoServiceProvider every time you need a hash, and you create hashes thousands of times. This creates an enormous amount of pointless work for the GC

  • Immutable strings? Use String.Format() and StringBuilder
    "Hello " + yourName + ", how are you?" creates 4 strings in memory at runtime in addition to yourName, whereas String.Format("Hello {0}, how are you?", yourName) creates only two. Use the StringBuilder class when you need to construct a complex string with loops

  • myString.Equals("foo", StringComparison.OrdinalIgnoreCase) is faster than myString.ToUpper() == "FOO"
  • If your language has built-in string support, then it may also have special functions for testing case-insensitive equality. The above is from .Net, and is faster than casting ToUpper() because strings are immutable in .Net and the ToUpper() function will waste time creating a new string

  • Don't use Regular Expressions when Equals(), Replace(), Substr() and others will do
    Even compiled RegExes are orders of magnitude slower than basic string operations. Use RegExes when you simply can't do it with primitives, or when the code would be even more unreadable without them

  • WPF? Use the OnEventName overrides instead of handling events
    If you need to handle MouseDown or KeyPress or other events on a WPF window, override the OnMouseDown and OnKeyPress methods instead of using XAML (like when you do <Window KeyPress="myKeypressHandler" ...>). Handling events is slower because the underlying mechanism has to test for the existence of handlers and then iterate through all of the handlers that have been attached. This only works for derived classes (such as the code-behind created for each Window and custom control) as well as when the handler doesn't change during runtime, so for all other situations you'd continue to use events

  • Don't keep trying to do the wrong thing faster
    All of the line-by-line optimizations in the world won't make Bubblesort faster than Quicksort (or even just plain 'ol Selection Sort). Are you doing sequential searches when a binary search is possible? Are you manipulating strings when you could convert them into numbers or graphs or some other model? Are you using the MS Office COM libraries to fire-up a copy of Excel in the background, paste two numbers into cells, run an Excel function on them, read back the result and then shut-down EXCEL.EXE? Naughty-naughty. Buy a computer-science book and look for better ways of getting the same result

  • Design a program like you'd mount a car wheel
    If you completely tighten each lug nut one-at-a-time the wheel will go on crooked. So you tighten the nuts in alternating corners with at least two passes per nut to set the wheel snug and centered. When designing the components of a program you should not code and optimize each component to completion before starting on the next; you could be trapped with an inefficient interface. Instead, sketch the high-level design on paper, then keep alternating between components as you move into coding. You won't discover all of the possible high-level optimizations until you're knee-deep, and you won't be able to take advantage of them unless the implementation of each piece is still "loose"

  • Don't forsake the user
    Updating the screen is expensive, but if you don't you'll give the user the impression that your program is slow, even if it's actually doing its job faster without that stupid progress bar. And sometimes the progress bar is bad UI, when something richer like a progressively updated display of results is "faster" from the user's point of view--even though it increases overall time-to-completion by a factor of two or more

  • Don't forsake the developer
    Sometimes the above tips and more will make the code harder to understand, and this is the ultimate performance penalty. Over the long run, code that you can't comprehend is code you can't optimize when new options become available. If the profiler is telling you that the optimization was negligible, then go with the version that's easiest to read

What the compiler saw

 Consider this code:

int foo;
for (long i = 0; i < long.MaxValue; i++)
    foo = 1;

 The code is saying "loop 2^63 times, assigning 1 to the variable foo each time".

 This is what a compiler could optimize it to:

int foo = 1;

 Today's compilers are smart and getting smarter. Computer scientists (real scientists, not hacks like me writing tips) continue to put the best code optimizations into the compilers themselves, including using statistical analysis to know when, for example, it would be faster to inline a method, and when it would spoil the performance of the JIT. That modern compilers can perform this kind of analysis and still finish a build quicker than their predecessors is quite amazing.

 In the long run the best optimizations you can make to your code are to follow the philosophy of the language, because this is what the compiler will be designed to optimize. If the language favors a functional or declarative style of programming, then avoid trying to do everything in an imperative fashion. Likewise, if the language clearly wants you to write explicit loops and conditionals, then don't try to force a functional style on top of it.

 Your relationship with the compiler is similar to the relationship web sites have with search engines, another area where "optimization" is a hot topic and black arts are pretended to be used. And just like with compilers, many search engine optimization tricks of the 90s will backfire and hurt you if you tried to use them today. 

Acknowledgements

Gratitude to the Coding Reddit crowd for feedback and corrections.
Comments