After reading Jeff’s terribly misguided post about NP-completeness I thought to myself: “If Jeff, who seems to be completely clueless about complexity theory, can write a blog post about it then so can I.”
So without further ado, here’s ob’s complex guide to simplifying complexity theory.
First of all let’s get the definitions out of the way.
Turing machine: An idealized computer, with infinite memory, an infinitely fast cpu, and unlimited bandwidth. It’s a mathematical model of computer so it can kick your Core 2 Duo’s sorry little ass faster than you can say Pentium. (update: I originally wanted to make the point that bandwidth and cpu speed are irrelevant, but it seems to have confused more than clarified.)
Problem: In complexity theory, a problem is really a decision problem. It has a simple yes/no answer. You don’t ask “take this list and sort it and print the last ten items” that’s not a decision problem. You do ask “does this program, given this input, produce the last ten items of the sorted input?”.
(update: Although the sort problem as outlined here would be a valid decision problem, it is confusing since just changing my “last ten items” to “last C items” where C is a function of n would make it not be a decision problem. Better to use some other example, like “is the number given prime?“)
n: Size of the input for the turing machine. E.g. if your turing machine is sorting strings, like in the example above, n is the number of symbols that it takes as input.
T(n): Number of steps (i.e. instructions) that a turing machine must execute to solve a problem with input of size n.
Polynomial Time: When T(n) can be expressed as an equation of n, and possibly some constants, using only addition, subtraction, multiplication, and constant non-negative whole number exponents. E.g. n² is a polynomial while 2ⁿ is not.
Non-deterministic turing machine: A turing machine that can, at every branch point, take all possible branches in parallel. We call this guessing the optimal answer because, since the turing machine takes all branches, it must take the optimal branch too. If any of the parallel turing machines finds an answer, the non-deterministic turing machine stops and gives that answer (this means that it’s ok for some branches to never end).
So take your already idealized computer from above and make it able to reproduce itself on every branch point. I.e. it can follow all possible decisions in parallel.
P: The class of problems that can be solved by a turing machine in polynomial time.
NP: The class of problems that can be solved by a non-deterministic turing machine in polynomial time.
Reducing a problem to another: Imagine you have two problems, P1 and P2. You know something about P1 (e.g. that it belongs to NP). Now you want to prove some property of P2, for instance, that it belongs to NP as well.
One technique is to show that if we could solve P2 in polynomial time (i.e. P2 belongs to P), we could use that answer to solve P1 in polynomial time. This would mean that P1 belongs to P which is a contradiction. This technique is called reductio ad absurdum. (update: I completely botched this and commenter Bibi correctly called me on it. Here’s an attempt to clarify a reduction: if you want to solve an instance of P2, you convert it to an instance of P1, and then solve P1 to get an answer. Thus, you’ve reduced P2 to P1. How’s that?)
NP-complete: We say a problem is NP-complete if it satisfies two properties: 1) it belongs to NP, and 2) every other problem that belogs to NP can be reduced, in polynomial time, to it.
This means that if we come up with a solution for an NP-complete problem that runs in polynomial time in a deterministic turing machine (i.e. belongs to P), any other problem in NP can be solved in polynomial time by first reducing it (again, in polynomial time) to this problem, and using our solution. We would have proved that P = NP and we would have also won (pinky to corner of mouth) one million dollars.
NP-hard: A problem P1, that is so hard, that even though we can prove number two above, we can’t prove number one. This means we know P1 to be just as hard as anything in NP and possibly harder, but we don’t know if P1 is in NP or not. P1 is very likely to require exponential time.
(update: I stand by my definition of NP-hard. Although not being able to prove #1 is a sufficient condition, it is not necessary. All problems in P are in NP, and yet we call them P, not NP. Thus, even if all problems that are NP-complete are also NP-hard, it’s uninteresting to call an NP-complete problem NP-hard. The point of calling a problem NP-hard is that, it’s really hard! Even if it turned out that P = NP, an NP-hard problem might still be hard!)
So there you have it, go impress your friends. I hear that “hey, baby! your love is NP-hard to get” is a very successful pickup line at pubs around M.I.T.