|
The
CPU market is going in a definite direction. We know almost all
about Intel and AMD plans, so the CPU market can be called predicable.
But there is one thing we can not find out from roadmaps: the actual
performance of the further CPUs. Not even the manufacturer is able
to estimate the benchmark performance of a CPU, they can tell us
a lot about the theoretical performance which is similar to the
brute performance measured in number of operations per second, but
for the benchmark numbers they need to have a sample CPU. The marketing
team embosses the processor's advantages, but forgets to tell us
about its disadvantages. Any architectural change or core improvement
implies a list of disadvantages: weak compatibility, low performance
with old software, new motherboards, new compilators, etc. Currently
there are three x86 CPU manufacturers on the market: Intel, AMD
and VIA. The first two companies are well known and do not need
further presentations, but VIA is not known for their processors.
A CPU is more than a chipset and implies a much more complicate
designing and manufacturing process and even if VIA surprised with
good chipsets this doesn't mean that their processors have inherited
the same features. The VIA C3 processing line hasn't succeeded to
impress many people due to its weaker performance. But VIA claimed
that the low frequency processors have other areas of appliance
and the public should consider them as well. Today we will try to
find these areas and we will discuss frankly about the legitimacy
of choosing the lowest frequency CPU on the market.
Introduction
VIA
didn't enter the CPU market with no experience, in fact the C3 processor
is based on a combination of Centaur and Cyrix technology. The CPU
tries to impose a different approach to x86 architecture, but I
can not call this approach new. It is known that x86 processors
performance depends mostly on several instructions. VIA C3 optimizes
the execution time of the most frequent instructions and limits
the hardware for the less likely used instructions. These less used
instructions are implemented using the microcode, a solution adopted
by many manufacturers that target to thin clients market. There
are different opinions about the use of microcode to translate instructions
on desktop processors, but the use of microcode is generally needed
to ensure compatibility even in high performance CPUs. For example
64bit processors will need to include a translation logic to ensure
compatibility with 32bit processors. Generally every CPU uses a
microcode stored in a very fast ROM memory to form the CPU language.
If you are familiar with Transmeta Crusoe CPUs you probably know
that Crusoe uses a Code Morphing Internal CPU software to translate
instructions into a standard type of instructions called VLIW. The
CPU core is optimized to work with this so called processor language
so that external instructions variety has a very small impact on
performance.
Transmeta targets to mobile market and may be a concurrent to VIA
C3, but from the technological point of view C3 is somehow between
any other x86 CPU from AMD or Intel and a Transmeta Crusoe. When
a instruction translation occurs, the CPU loses time so the performance
may not be as high as the performance of a traditional CPU running
at the same frequency. However, there is one big advantage, the
use of a relatively low number of transistors. Practically VIA C3
wants to provide a good performance while keeping the heat level
very low.
From the engineering point of view, VIA C3 is a very simple CPU.
Many advanced optimization techniques like out of order execution
are not implemented because they require a lot of onchip logic.
Instead VIA C3 implements an advanced memory bus managing technique:
two 4 way 64Kb caches, two 8 way 128 entries TLB, two four entry
page directory caches, advanced prefetching, etc. I do not want
to talk about cache because this is known by most readers, but about
TLB function and features. The Translation Lookaside Buffers role
in a CPU is to store the most recently used page-directories and
page-table entries. If you want just think that TLB is like a level
0 cache. TLB speeds up the data availability because the memory
access is not required for every operation. TLB plays a major role
in system performance and VIA C3 TLB is indeed impressive. In fact
Pentium 4 has also a 128 entries TLB, but it's 4 way associative,
which means that it's less efficient.
The C3 pipeline facilitates higher frequencies because it's twelve
staged, a little bit larger than Pentium 3 pipeline, but not huge
and potentially inefficient
like Pentium 4 pipeline. Let's take a look to the pipeline. |