Home » Posts tagged 'big.LITTLE Processing'

Tag Archives: big.LITTLE Processing

Eight-core MT6592 for superphones and big.LITTLE MT8135 for tablets implemented in 28nm HKMG are coming from MediaTek to further disrupt the operations of Qualcomm and Samsung

Updates

Update: The Power of 8: MediaTek True Octa-Core [mediateklab YouTube channel, July 29, 2013]

MediaTek is the first in the world to optimize and adopt True Octa-Core technology for the perfect balance of power and performance. Unlike existing octa-core solutions in the market, which can only activate half of their CPU cores at once, MediaTek True Octa-Core allows for all eight of its cores to run simultaneously, offering the ultimate combination of performance and power-efficiency. *Learn about MediaTek True Octa-Core Solution: http://www.mediatek.com/_en/Event/201307_TrueOctaCore/tureOcta.php

Update:  MT6592—The world’s first true octa-core SOC with scalable eight-core processing [product page, March 13, 2014]

Overview

MediaTek MT6592 is the world’s first heterogeneous computing SOC with scalable eight-core processing for superior multi-tasking, industry-leading multimedia features and excellent performance-per-watt. Based on 28nm HPM (High-Performance Mobile) process technology, MT6592 has eight CPU cores, each capable of clock speeds up to 2GHz.

Features

  • ARM® Cortex®-A7 processor (1.7GHz or 2GHz)
  • 28nm HPM process technology
  • MAGE 3D graphics engine
  • UMTS / HSPA+ R8 / TD-SCDMA / EDGE / LTE
  • 801.11a/b/g/n, Bluetooth, GPS, FM tuner
  • 16MP camera image signal processor
  • Full HD H.265 / VP9 and Ultra HD H.264 video playback
  • ARM Mali™ GPU (700MHz)
  • MediaTek ClearMotion™ video enhancement

Update: [€147.18] Cubot X6 OctaCore MT6592 Phone Ultimate Slim Design 5″ OGS HD Retina [arif rachman YouTube channel, March 1, 2014]

Cubot X6 OctaCore MT6592 Ultimate Slim Design
Please follow the link below to see the full specification
http://bit.ly/CubotX6
This is the latest phone from Cubot. Well.. the phone has the latest MT6592 1.7GHz processor. 28nm process, with quad core mali 450 GPU. Frequency is up to 700MHz. It supports full HD video with wide screen decoding format.
The true eight core processor can run simultaneously through advanced scheduling algorithm, dynamic temperature control and power management technology to optimize workload distribution to each core. When handling multiple tasks and heavy duty needs, achieve the peak performance of full eight core. At light load, you can turn off the core, the ultimate energy saving idle. It means substantial increase in cell phone battery life.
The Mali 450 graphics processor, overall performance is up to twice of the previous Mali 400. It supports full-HD 60fps. The triangles per second and render is 152M 2.8G pixels. Should be easy to run 3D games, smooth playback of 1080 HD videos. It also has a built in powerful MAGE 3D engine.
The front camera is 5 mega pixels while the back camera is 8 mega pixels. The camera is equipped with five pieces of high precision glass structure, which can effectively filter infrared blue glass. This is to achieve the level of professional SLR camera. Far better than ordinary lenses. The phone uses Sony sensor with latest 13Mega-Pixel CMOS Image Sensor.
In a week, the phone will be available at banggood for only $184.99 with free shipping worldwide! That’s an octa core phone below 200$ price tag! Not cheap enough?
Leave your email to get referred and get 10$ discount! Cheapest price out there!
Please follow the link below to see the full specification
http://bit.ly/CubotX6

IllusionMage [Wikipedia, excerpted on March 15, 2014]

IllusionMage is a paid for 3D modeling, animation, and rendering software packages comprising the core engine of Blender, an open-source, 3D software suite, and bundled with materials related to Blender.

Other names this bundle has gone under are IllusionMage3D, 3DMagix, and 3DMagixPro.[1]

All materials and software included are freely available from other sources. The marketing of this program includes images that were stolen from other sources, often created with competing 3D applications. The image of the alleged creator of the software, Seth Avery, is a random stock photo.[2]

Criticism

IllusionMage has come under fire by many prominent Blender news sites and figures, including Ton Roosendaal, the founder of the Blender Foundation[3][4]

Related

References

  1. “Illusion Mage & 3D Magix Pro (affiliate) domain names” Topic: Illusion Mage & 3D Magix Pro *is* a scam. KatsBits Forum. Retrieved 2 October 2011.
  2. “Handsome young man isolated over white”. Laflor Photography via iStockPhoto. Retrieved 8 May 2012.
  3. January 2011 Blender Foundation Press Release
  4. “3DMagix and IllusionMage, scam or open source leeches?”. BlenderNation. Retrieved 30 September 2011.

Update: 联发科平板四核心MT8135 官方成绩曝光 (MediaTek MT8135 quad-core tablet exposure Official Results) [ 驱动之家 (MyDrivers.com), July 29, 2013]

image

Update: MediaTek’s Quad-core Tablet SoC MT8135 : Performance Benchmark [mediateklab YouTube channel, July 19, 2013]

MediaTek introduces industry leading tablet SoC -MT8135, which integrates ARM’s big.LITTLE™ processing subsystem and a PowerVR™ Series6 GPU from Imagination Technologies. MediaTek MT8135 fulfills the most demanding CPU and GPU usage scenarios, whether it is heavy web downloading, hardcore gaming, high-quality premium video viewing or rigorous multitasking, while maintaining the utmost power efficiency. In this video, you’ll see how MediaTek MT8135 outperforms today’s tablet solutions.
Update: MediaTek Introduces Industry Leading Tablet SoC, MT8135 [press release, July 29, 2013]
TAIWAN, Hsinchu – July 29, 2013 – MediaTek Inc., (2454: TT), a leading fabless semiconductor company for wireless communications and digital multimedia solutions, today announced its breakthrough MT8135 system-on-chip (SoC) for high-end tablets. The quad-core solution incorporates two high-performance ARM Cortex™-A15 and two ultra-efficient ARM Cortex™-A7 processors, and the latest GPU from Imagination Technologies, the PowerVR™ Series6. Complemented by a highly optimized ARM® big.LITTLE™ processing subsystem that allows for heterogeneous multi-processing, the resulting solution is primed to deliver premium user experiences. This includes the ability to seamlessly engage in a range of processor-intensive applications, including heavy web-downloading, hardcore gaming, high-quality video viewing and rigorous multitasking – all while maintaining the utmost power efficiency.
In line with its reputation for creating innovative, market-leading platform solutions, MediaTek has deployed an advanced scheduler algorithm, combined with adaptive thermal and interactive power management to maximize the performance and energy efficiency benefits of the ARM big.LITTLE™ architecture. This technology enables application software to access all of the processors in the big.LITTLE cluster simultaneously for a true heterogeneous experience. As the first company to enable heterogeneous multi-processing on a mobile SoC, MediaTek has uniquely positioned the MT8135 to support the next generation of tablet and mobile device designs.
“ARM big.LITTLE™ technology reduces processor energy consumption by up to 70 percent on common workloads, which is critical in the drive towards all-day battery life for mobile platforms,” said Noel Hurley, vice president, Strategy and Marketing, Processor Division, ARM. “We are pleased to see MediaTek’s MT8135 seizing on the opportunity offered by the big.LITTLE architecture to enable new services on a heterogeneous processing platform.”
“The move towards multi-tasking devices requires increased performance while creating greater power efficiency that can only be achieved through an optimized multi-core system approach. This means that multi-core processing capability is fast becoming a vital feature of mobile SoC solutions. The MT8135 is the first implementation of ARM’s big.LITTLE architecture to offer simultaneous heterogeneous multi-processing.  As such, MediaTek is taking the lead to improve battery life in next-generation tablet and mobile device designs by providing more flexibility to match tasks with the right-size core for better computational, graphical and multimedia performance,” said Mike Demler, Senior Analyst with The Linley Group.
The MT8135 features a MediaTek-developed four-in-one connectivity combination that includes Wi-Fi, Bluetooth 4.0, GPS and FM, designed to bring highly integrated wireless technologies and expanded functionality to market-leading multimedia tablets. The MT8135 also supports Wi-Fi certified Miracast™ which makes multimedia content sharing between devices remarkably easier.
In addition, the tablet SoC boasts unprecedented graphics performance enabled by its PowerVR™ Series6 GPU from Imagination Technologies. “We are proud to have partnered with MediaTek on their latest generation of tablet SoCs” says Tony King-Smith, EVP of marketing, Imagination. “PowerVR™ Series6 GPUs build on Imagination’s success in mobile and embedded markets to deliver the industry’s highest performance and efficient solutions for graphics-and-compute GPUs. MediaTek is a key lead partner for Imagination and its PowerVR™ Series6 GPU cores, so we expect the MT8135 to set an important benchmark for high-end gaming, smooth UIs and advanced browser-based graphics-rich applications in smartphones, tablets and other mobile devices. Thanks to our PowerVR™ Series6 GPU, we believe the MT8135 will deliver five-times or more the GPU-compute-performance of the previous generation of tablet processors.”
“At MediaTek, our goal is to enable each user to take maximum advantage of his or her mobile device.  The implementation and availability of the MT8135 brings an enjoyable multitasking experience to life without requiring users to sacrifice on quality or energy. As the leader in multi-core processing solutions, we are constantly optimizing these capabilities to bring them into the mainstream, so as to make them accessible to every user around the world,” said Joe Chen, GM of the Home Entertainment Business Unit at MediaTek.
The MT8135 is the latest SoC in MediaTek’s highly successful line of quad-core processors, which since its launch last December has given rise to more than 350 projects and over 150 mobile device models across the world. This latest solution, along with its comprehensive accompanying Reference Design, will like their predecessors fast become industry standards, particularly in the high-end tablet space.
Update: Optimized big. LITTLE – MediaTek [MediaTek, July 29, 2013]
Multi-core system-on-chip (SoC) design has brought tremendous benefits to mobile device users by offering seamless engagement in rigorous multitasking. To overcome the issue with high energy consumption and thermal readings, MediaTek is deploying an advanced scheduler algorithm, combined with adaptive thermal and interactive power management to maximize the performance and energy efficiency benefits of the ARM big.LITTLE™ architecture. The technology will allow applications software to simultaneously access all the processors in the big.LITTLE™ cluster for a true heterogeneous experience, activating both of its CPU clusters concurrently for extreme performance.
Optimized big. LITTLE™
ARM big.LITTLE™ processing is designed to address the energy and thermal issues associated with multi-core system-on-chip (SoC) solutions. It allows for the creation of dual-cluster SoCs, with one more powerful (big) cluster for processing intensive tasks and a less powerful (LITTLE) cluster for executing routine functions. MediaTek is among the first SoC designers to have adopted this ground-breaking technology. Unlike its counterparts, however, the company has done so in a manner that affords device users the utmost energy and thermal efficiency rates.
Enabling Heterogeneous Multi-Processing
imageOf the three big.LITTLE™ software models that can be integrated, for example, MediaTek chose the Heterogeneous Multi-Processing [developed and named by ARM as Global Task Scheduling (GTS), also known earlier as big.LITTLE MP, see in the last section of this post in detail] approach, which unlike the other two methods – Cluster- [as was implemented in Galaxy S4 by Samsung with Exynos 5 SoC having 4xA7+4xA15 configuration] and CPU-Migration [IKS (In Kernel Switcher) developed by Linaro, see in the last section of this post in detail] – allows for individual cores to be activated as and when needed for maximum efficiency.
However, use of the most versatile model isn’t MediaTek’s only advantage. In line with its reputation for creating innovative, market-leading platform solutions, MediaTek has deployed an advanced scheduler algorithm, combined with adaptive thermal and interactive power management to maximize the performance and energy efficiency benefits of the ARM big.LITTLE™ architecture.
The technology will allow applications software to simultaneously access all the processors in the big.LITTLE™ cluster for a true heterogeneous experience, activating both of its CPU clusters concurrently for extreme performance.
imageIn comparison, the current octa-core SoC solution, utilizes one of the more inferior big.LITTLE™ software models. As a result, the processor is not as efficient as it otherwise might be.
As the first company to enable Heterogeneous Multi-Processing on a mobile SoC in the form of its MT8135 Reference Design, MediaTek is uniquely positioned to support the next wave of tablet and mobile devices.
Update: Optimized ARM big.LITTLETM – MediaTek Enables ARM big.LITTLETM Heterogeneous Multi-Processing Technology in Mobile SoCs [MediaTek Position Paper in PDF, July 29, 2013]

MediaTek MT8135 brings PowerVR Series6 GPUs to a mobile device near you [With Imagination Blog, July 29, 2013]

Over the years, our close partnership with MediaTek has resulted in the release of some very innovative platforms that have set important benchmarks for high-end gaming, smooth UIs and advanced browser-based graphics-rich applications in smartphones, tablets and other mobile devices. Two recent examples include:

MediaTek has been steadily establishing itself as an important global player for consumer products like smartphones, tablets and smart TVs, with a strong foothold in Latin America and Asia, and a rapidly growing presence in Europe and North America. Earlier this year, MediaTek introduced MT8125, one of their most successful tablet chipsets for high-end multimedia capabilities.

image

While MT8125 has been extremely popular with OEMs including Asus, Acer or Lenovo, MT8135 has the potential to consolidate Mediatek’s existing customer base and open up exciting new opportunities thanks to the advanced feature set provided by Imagination’s PowerVR ‘Rogue’ architecture.

MT8135 is a quad-core SoC that aims for the middle- to high-end tier of the tablet OEM market. It supports a 4-in-1 connectivity package that includes Wi-Fi, Bluetooth 4.0, GPS and FM radio, all developed in-house by MediaTek. Miracast is another important addition to the multimedia package, enabling devices using MT8135 to stream high-resolution content more easily to compatible displays, over wireless networks.

image

MT8135 incorporates a PowerVR G6200 GPU [from the block diagram corresponds to the PowerVR G6230] from Imagination that enables advanced mobile graphics and compute applications for the mainstream consumer market, including fast gaming, 3D navigation and location-based services, camera vision, image processing, augmented reality applications, and smooth, high-resolution user interfaces.

image

As MT8135-powered mobile devices start appearing in the market, developers will have access to new technologies and features introduced by our PowerVR Series6 family such as:

  • our latest-generation tile based deferred rendering (TBDR) architecture implemented on universal scalable clusters (USC)
  • high-efficiency compression technologies that reduce memory bandwidth requirements, including lossless geometry compression and PVRTC/PVRTC2 texture compression
  • scalar processing to guarantee highest ALU utilization and easy programming

Thanks to the PowerVR G6200 GPU inside the MT8135 application processor, MediaTek brings high-quality, low-power graphics to unprecedented levels by delivering up to four times more ALU horsepower compared to MT8125, its PowerVR Series5XT-based predecessor. PowerVR G6200 fully supports a wide range of graphics APIs including OpenGL ES 1.1, 2.0 and 3.0, OpenGL 3.x, 4.x and DirectX 10_1, along with compute programming interfaces such as OpenCL 1.x, Renderscript and Filterscript.

image

By partnering up with Imagination, MediaTek has access to our industry-leading PowerVR graphics, worldwide technical support, and a strong ecosystem of Android developers capable of making the most of our technology. We look forward to shortly seeing our brand-new PowerVR Series6 GPUs in the hands of millions of consumers, and see MediaTek as one of our strategic partners for our latest generation PowerVR GPUs moving forward.

End of Updates

This report consists of the following parts:

  • The latest MediaTek roadmap, high-end and OS strategy
  • News reports about MT6592 and its first application

  • Update: MediaTek True Octa [MediaTek, July 23, 2013] imageEfficient video playback:
    When on decoding mode, the battery used for decoding HEVC (H.265) FHD video
    can be reduced by up to 18 percent compared to current quad-core solutions
    (from MediaTek True Octa-Core Position Paper [MediaTek, July 23, 2013])
  • What is new vs. my earlier The state of big.LITTLE processing [‘Experiencing the Cloud’, April 7, 2013] report
For the preceding smartphone SoC in the current roadmap see MediaTek MT6589 quad-core Cortex-A7 SoC with HSPA+ and TD-SCDMA is available for Android smartphones and tablets of Q1 delivery [‘Experiencing the cloud’, Dec 12, 2012]. For smartphone SoCs before that  see Boosting the MediaTek MT6575 success story with the MT6577 announcement  – UPDATED with MT6588/83 coming early 2013 in Q42012 and 8-core MT6599 in 2013 [‘Experiencing the cloud’, June 27, July 27, Sept 11-13, Sept 26, Oct 2, 2012]. Note that MT6588 was renamed MT6589 when was launched, as MT6599 would be renamed MT6592 now.


The latest MediaTek roadmap, high-end and OS strategy

Maybank Kim Eng just published in its MediaTek Closing In Fast [July 17, 2013] report the following two SoC roadmaps:

image

GPU for MT6592 smartphone SoCs (and presumably for MT6588 as well) will be Mali according to Zhu Shangzu (朱尚祖), MediaTek Global Smartphone General Manager in the [Part 2] MediaTek to push 8 small cores, the mystery [ESM 国际电子商情 (International Electronic Business), July 18, 2013] exclusive interview.
According to 28nm Technology [TSMC, June 21, 2011] description: The 28nm technology node of the TSMC foundry (which is used for manufacturing by MediaTek) has a high performance (HP) process as the first option to use high-k metal gate (HKMG) process technology. The 28nm low power with high-k metal gates (HPL) technology, as the second option, adopts the same gate stack as HP technology while meeting more stringent low leakage requirements with a trade of performance speed. Explanation: From about 10 µm (1971) to below 0.1 µm (100 nm) conventional silicon oxynitride as the gate insulator with polysilicon gate, so called poly/SiON gate stack, was used for CMOS technology. It was typically possible to scale down to 45 nm (2008), only TSMC was able to scale it down further to 28 nm in which most of the current 28nm SoCs from TSMC are produced. imageWhile Intel (and IBM) had to introduce high-K dielectric as the gate insulator with metal gate, so called High-k / Metal Gate stack,  for the performance of their 45 nm products in 2008 (in order to continue with the Moore’s law in their realm) as you could see on the right (taken from Life With “Penryn” [DailyTech, Jan 27, 2007] interview with Mark Bohr, Intel Senior Fellow, and Steve Smith, Intel Vice President DEG Group Operations), TMSC could introduce that only on the 28nm node as described above. The HKMG based 28nm SoCs are much higher performance (or higher performance still with low power by HPL) as you could see from the 2GHz clockrate of the MT6592 (above) or MT8315 (below) vs. that of the convential poly/SiON counterparts, MT6589 and MT8389 with 1.2GHz.

image

Complementary post reminder: H2CY13: Upcoming next-gen Nexus 7, the ASUS MeMO Pad HD 7 “re-incarnation” at reduced by $50 price, dual/quad-core mid-range tablets from white-box vendors starting from $65 [‘Experiencing the Cloud’, July 5, 2013] in which there is plenty of information regarding the non high-end tablet SoCs, from MediaTek (MediaTek MT8125, MediaTek MT8377 and MediaTek MT8389) as well as competition from Allwinner and Rockchip. The pre-eminent ASUS MeMO Pad™ HD 7 described in detail there is using the MT8125 SoC, while the new Nexus 7 (to be announced before the ending of July) the  Qualcomm Snapdragon 600 Quad Core SoC. In that sense we got with that post not only a complete H2 competitive tablet market picture for mid-range but some information regarding the new Qualcomm high-end as well.
For the upcoming MT8135 tablet SoC it is known from the part 3 of the Zhu Shangzu interview that the quad-core configuration will be 2xA15+2xA7, which means a big.LITTLE architecture and quite probably the already mature ‘In Kernel Switcher’ (IKS) scheduler initially GTS with MediaTek’s “advanced scheduler algorithm, combined with adaptive thermal and interactive power management” and called Heterogeneous Multi-Processing (HMP) by MediaTek (see in the updates in front of the original post). But as As ARM already decided on the architecture of the other, more general ‘Global Task Scheduling’ (GTS) solution (see much below) I would assume that the proper hardware underpinnings for GTS will already be built in (unlike in the Samsung’s Exynos 5 SoC released before), so when the scheduler software will be mature enough it will run well on MT8135. The inclusion of just two cores of each (unlike in Exynos 5) is a very strong proof-point of that. As far as the GPU is concerned we know from Zhu Shangzu interview that an Imagination GPU will be used, therefore I will leave the next-generation SGX6XX (PowerVR Series6 or ‘Rogue’) indication in the above table. Update: It is the PowerVR G6200 GPU [from the block diagram corresponds to the PowerVR G6230] as you could see from Imagination block post published on the MT8135 announcement (July 29), and included here in front of the original post.

with the following commentary:

Strong fundamentals intact. Having exceeded its 2Q13 guidance so significantly, we believe MTK will continue to ride the strong momentum in 3Q13, perhaps growing its revenue by low-to-mid-teens QoQ or 30% YoY to chalk up another record high of TWD36-38b [US$1.2-1.27B]. Importantly, a better product mix and cost structure would help lift its profitability to ±44%. We expect MTK to ship 70-72m units of smart devices, up 25-30% QoQ, with quad-core APs and tablets making up nearly 50% of total shipment. The benefits of operating leverage should drive OPM past 20%, the highest since 3Q10. MTK is set to report its 2Q13 results in late July or early August and we forecast net profit of TWD6.8b [US$227M] (EPS: TWD5.02; Street: TWD6.3b), up over 80% QoQ and 100% YoY. GM is also likely to meet the high end of its guidance, ie, 43.5%, on richer mix and improved cost structure. Reported revenue of TWD33.3b, up almost 40% QoQ and 42%YoY, is already well ahead of guidance (TWD30-32b). However, we cut our FY13/14 earnings forecasts by 3% each to factor in the delay in merger with MStar and potential inventory correction in 4Q13/1Q14. MTK remains a key BUY in our tech space.

Closing in fast on QCOM. MTK has spared no efforts to enhance its smart device portfolio since 2H12 and further signs of acceleration are evident. It is introducing two high-end APs in 4Q13MT6588 and MT6592 – using 28nm HKMG and advanced graphic features. While the former is a quad-core AP operating at 1.7GHz, the latter is capable of running at 2GHz (when all eight core engines are turned on). In the absence of full details, we estimate MT6592 may perform closer to Qualcomm Snapdragon 600 AP (used in Galaxy S4 and HTC One), while MT6588 should outshine Snapdragon 400. MTK has won several international OEMs with MT6589 and with MT6588/6592, its chances of penetrating tier-1 OEMs have increased significantly. In addition, it will sample its high-end 4G/LTE/LTE-TDSCDMA modem chipset in anticipation of the launch of 4G network in China later this year. As for tablets, MTK’s latest APs MT8125/8389 were well-received and it is set to deliver the high-end MT8135 (big.Little design) in 3Q13. We expect its smartphone/tablet shipments to reach 200-225m/25m units in 2013.

In the same part of the interview Zhu Shangzu explained MediaTek’s high-end strategy as follows (as translated by Google and Bing with manual edits):

image

… I think the future of high-end smartphones innovation will focus on the expansion of big screen multimedia applications, and this is our direction. …

Judging from the current situation, customers of high-end flagship phones are still using the products of the competitors, but there is flagship in our quad-core case as well, and OPPO, Vivo and GiONEE and other quad-core phones are also very popular. Our next goal is to get the customers of flagship machines using our platform via helping customers to achieve stronger performance on the big screen multimedia.

Therefore, the 8-core MT6592 can be regarded as our first bugle call for moving towards the high-end market. Our mission is that one day customers can also recognize MediaTek as doing high-end flagship products. MT6592 is the first step, strictly speaking, it is not the most high-end platform, next we will move step by step towards the higher end.

Q: Why will MediaTek use eight small A7 cores as a generation of high-end platform, but did not choose to use four large A15 cores or four big and four small ones as a way to achieve the goal? This is also a question for the industry as there are many controversial issues with this.  

For power, or performance per watt, we did a lot of investigation. Eight A7 cores is currently the best solution, and as through a process we designed to boost peak frequency of the A7 to 1.9-2Ghz, performance is also very strong.

Currently we chose a small core, because under the existing process, the larger the chip die size, the larger is the standby leakage, resulting in higher standby power consumption. For example, the A15 is the strongest core currently, but not in run-time power cosumption. Even if its frequency is pushed down to very low levels, there is still a larger leakage. Therefore, the larger is the area of a ​​single-core, the larger is the overhead energy efficiency, and as long as the poweris on, there will be a greater leakage.

In addition, the 8-core CPU is just one aspect of improving the mobile multimedia experience. In fact, as we have been doing MediaTek digital TV for a long time, we will extend that digital TV competency here – some strong move for the smartphones. This is what other platform vendors can not do. In the 6592, for example, the latest HEVC codec will be integrated. [HEVC is a video compression standard, a successor to H.264/MPEG-4 AVC]

Although our MT6592 GPU is also using a ‘Deluxe Mali quad-core GPU, but in order for content developers to achieve better compatibility, our HEVC is a software solution via the 8-core CPU, it is not using a GPU- based software solution. Because there are some strong content developers who will use their own HEVC decode. Currently the ‘Deluxe’ quad-core GPU on 6592 is mainly used to perform large-scale games and to do some advanced UI.

[Part 3] How to plan the future in the tablet market?

Q: I do note that the MT6592 is now using a quad-core Mali GPU, while before the MediaTek mainstream used Imagination GPU. How would you rate these two companies’ products?

The Imagination company has been doing GPUs long time in its history, the architecture design is beautiful, more artistic. The initial architecture of Mali [from ARM] would be more rough, and therefore area and power consumption will be worse. But after nearly three years of time, Mali has made a lot of progress, both are learning from each other, and by now the levels of these two are equal. The future perspective is that ARM’s overall resources are somewhat more fully available.

Q: This year we have seen MediaTek  to attack the tablet market, what is the plan for the future in the tablet market?

A: Our current strategy is to carry out a mobile phone product line extension.

At the end of July the launch of a tablet chip is expected: the MT8135, with 2xA15 +2xA7, still using an Imagination GPU [Update: It is the PowerVR G6200 GPU (from the block diagram corresponds to the PowerVR G6230) as you could see from Imagination block post published on the MT8135 announcement (July 29), and included here in front of the original post], and mainly targeting the high-end tablet market. A small reminder, our MT6572 is not suitable for tablet computers as the original definition did not take into account the application of flat-screen.

[Part 6] If Google Android OS will be converged how MediaTek will respond?

Q: There is also a very large concern, as the industry is worried that after doing their own hardware next year (e.g. Xphone, watches, glasses, etc.) whether Google will close the Android OS, i.e. to do a Pure Android later on, and don’t let OEMs to change it? MTK will also have a very big impact, what do you think? What is the MTK attitude on other free OS’s?

A: If Google OS will be closed and converged that will have a huge impact on us. But from what we observe and communicate with Google, they will not close the OS or converge it. Google’s profitability does not depend on OS, he is relying on the service for profit. By doing hardware Google also aims to promote his services, he is very happy to use someone else’s machine on their home services.

Of course, we will also be prepared, as we comprehesively examine and take into account the prevailing factors. We will use Windows as a second priority, while using Firefox [OS] and HTML5 as a secondary backup, by keeping track of them. Because we judge that the [Android] OS convergence from Google profitability point of view is very low, therefore our vote for these two emerging open OS’s is in the ‘not so urgent’ category, in addition to and outside of Android. The other focus is again on Windows Phone 8For the moment, however, WP8 hardware configuration requirements are still higher (mainly memory), power consumption – after optimizing the gap with Android – is not too large.


News reports about MT6592 and its first application
Update: MediaTek True Octa [MediaTek, July 23, 2013]

Efficient video playback:
When on decoding mode, the battery used for decoding HEVC (H.265) FHD video
can be reduced by up to 18 percent compared to current quad-core solutions
(from MediaTek True Octa-Core Position Paper [MediaTek, July 23, 2013])

July 18 this information appeared on the English http://en.v5zn.com/ website of the related smartphone vendor as well: MediaTek MT6592’s first eight-core mobile phone exposure makes you believe [July 15, 2013] as translated by Google and Bing with manual edits

MediaTek so-called true eight-core processor MT6592 was announced not long ago, it is expected the first models equipped with processors to surface. It broke the news, that the domestic mobile phone manufacturer brand named after the 19th-century French writer Jules Verne [凡尔纳] has been determined to launch a flagship model “V8” quipped with the MT6592 processor.

Verne’s current main product is the “V5” model, equipped with a quad-core MediaTek MT6589, and a 5-inch 720p OGS full lamination screen, 1GB of RAM, 4GB storage, 8-megapixel back-illuminated camera, 2400 mAh Battery, with a list price of 999 yuan [$166].

V8 has not yet announced the exact configuration bit it is estimated to have about 5.5 inch 1080p screen, 2GB RAM, 32GB storage, 13 million pixels Sony stacked camera, higher capacity battery, etc., without these natural shot himself embarrassed flagship.

It looks like that cooperation between MediaTek and the domestic Shanzhai vendors remains close. As MT6589 has rocked the Main Street, MT6592 will soon become a standard, and “an eight-core” promotion will be overwhelming.

Incidentally recap: MT6592 uses eight Cortex-A7 architecture cores, clocked at up to 2.0GHz, with TSMC 28nm manufacturing, Antutu run is known as close to 30,000, but the graphics core has not been confirmed, PowerVR SGX 544MP4/554MP4 are likely [it will be Mali, as communicated by MediaTek, see above].

The marketing of the processor has begun to customers, but mass production will be in November, so if recent high profile publicity is to be fulfilled, certainly we will have a large sale early next year.

Company introduction [Jules Verne mobile phone, January 16, 2013] as translated by Google and Bing with manual edits

Shenzhen MINDRAY Platinum Communication Technology Ltd. is is specialized in products development, production, sales and service of intelligent mobile terminals of high-tech companies. Under the “Jules Verne VOWNEY” brand the company is to create a mobile intelligent terminal brand.
MINDRAY Platinum company with “intelligent life” as the brand mission, is to “enhance the user experience, to help people grasp the development opportunities” as the goal, trying to make Jules Verne a trustworthy, continuous innovation and smart moves life guide. Every effort, just as long as you!
Jules Verne mobile phone network direct sales, stripping agents layers, increases direct benefits to consumers. We are committed to allow more consumers to have a better quality of life with an intelligent terminal.
The “Jules Verne VOWNEY ” brand aspires to be able to improve the quality of life for mobile users intelligent terminal INITIATIVE persons.
is to become quality of life can improve the user moves Smart The Terminal Guide. Lead you into “Slide 5.0”.
“Verne VOWNEY “brand aspires to be able to improve the quality of life for mobile users intelligent terminal INITIATIVE persons. I lead you into the “Slide 5.0″era.
Brand interpretation
Jules Verne: a derivative of intelligent life???
English explanation : VOWNEY
V : value— Value
O : opportunity— Opportunity
W : worth— It is worth
N : new— New
E : e— Mobile Internet
Y : you— You
Jules Verne is to ” create a new life guided smart” as the goal, and strive to become a trusted, sustainable and innovative mobile phone brand, all efforts, just because of you!

Mediatek MT6592 8 core processors coming by the end of July! [Gizchina.com]

Reports out of Taiwan state that Mediatek will launch the MT6592 8-core processor by the end of July.

There was word that Mediatek were working on an 8 core chipset late last year, but like many we believed it had been placed on the back burner while they prepared their LTE chip. This seems to be wrong though as sources in Taiwan claim that Mediatek’s 8-core processor will arrive before the end of this month!

The MT6592 chip will be made up of 8 Cortex-A7, 28nm processor clocked at a frequency of up to 2Ghz! Early tests have the 8 core MT6592 scoring up to 30,000 points in Antutu which is more than Samsung’s 8 core Exynos 5410 processor.

The first batch of these new processors will be ready for manufacturers to begin development by the end of July, while Mediatek are preparing full-scale manufacture for November!

If everything goes to plan we can expect powerful 8 core phones from Tier 1 Chinese phone manufacturers by December!

MediaTek to launch true 8-core, 2GHz MT6592 chipset in November? [Engadget, July 2, 2013]

Samsung may already have its 8-core Exynos 5 Octa offering, but the original “big.LITTLE” implementation means only up to four cores work together at any time — either the Cortex-A15 quartet or its lesser Cortex-A7 counterpart. In other words, we’d rather rename the chipset range to something like “Exynos 5 Quad Dual.” But according to recent intel coming from Taipei and Shenzhen, it looks like Taiwan’s MediaTek is well on its way to ship a true 8-core mobile chipset in Q4 this year.

The first mention of this 2GHz, Cortex-A7 MT6592 chip came from UDN earlier today. The Taiwanese publication claims MediaTek started introducing its first octa-core product to clients last week, and it’s expected to enter mass production using TSMC’s 28nm process in November. The first mobile devices to carry this hot piece of silicon may hit the market in early 2014 — hopefully just in time for the Chinese New Year shopping rush.

UDN adds that the MT6592 scored close to 30,000 on AnTuTu, which is pretty high but still some distance behind Qualcomm’s 2.2GHz quad-core Snapdragon 800. Of course, chances are MediaTek’s offering will be much cheaper, as evidenced by all the affordable MediaTek-powered devices in China these days.

In a separate article from last week, UDN pointed out that judging by over a hundred job openings released by MediaTek last month, the company is clearly putting an emphasis on 4G LTE technology, alongside GPU and Android development. The publication also quoted chairman Tsai Ming-kai saying he will launch an LTE solution in Q4 this year, by which point MediaTek will only be one or two years behind its competitors.

The second piece of info came from HQ Research analyst Pan Jiutang, who posted an alleged spy shot of MediaTek’s upcoming roadmap (pictured left). There the octa-core MT6592 is listed with a clock speed of 1.7GHz to 2GHz, along with 1080p 30fps video decoding support. There’s also a quad-core 1.7GHz MT6588 accompanying its octa-core sibling in the same period on the timeline, though it appears to be just a faster version of the current 1.2GHz MT6589.

For the sake of phone manufacturers, both new chipsets will apparently be pin-to-pin compatible with the quad-core 1.3GHz MT6582 due Q3 this year, thus lowering R&D costs. Better yet, the roadmap also states that the MT6290 LTE modem — as teased by Tsai above — will be compatible with these three chipsets.

With MediaTek quickly catching up ahead of China’s eventual TD-LTE launch, Qualcomm will need to tread carefully to keep its Chinese QRD partners happy.

[Thanks, Ryan!]

Update: It’s worth noting that ARM’s eventual “big.LITTLE MP” implementation will allow all eight cores to run simultaneously, but the Exynos 5 Octa currently doesn’t support this. Thanks, UncleAlbert!

SOURCE: Sina Weibo (login required), UDN (1), (2)


What is new vs. my earlier
The state of big.LITTLE processing [‘Experiencing the Cloud’, April 7, 2013] report

Power scheduler design proposal [by Morten Rasmussen from ARM on Linux kernel mailing list, July 9, 2013]

This patch set is an initial prototype aiming at the overall power-aware scheduler design proposal that I previously described <http://permalink.gmane.org/gmane.linux.kernel/1508480>.

The patch set introduces a cpu capacity managing ‘power scheduler’ which lives by the side of the existing (process) scheduler. Its role is to monitor the system load and decide which cpus that should be available to the process scheduler. Long term the power scheduler is intended to replace the currently distributed uncoordinated power management policies and will interface a unified platform specific power driver obtain power topology information and handle idle and P-states. The power driver interface should be made flexible enough to support multiple platforms including Intel and ARM.

This prototype supports very simple task packing and adds cpufreq wrapper governor that allows the power scheduler to drive P-state selection. The prototype policy is absolutely untuned, but this will be addressed in the future. Scalability improvements, such as avoid iterating over all cpus, will also be addressed in the future.

Thanks,
Morten

From <http://permalink.gmane.org/gmane.linux.kernel/1508480>

                        +-----------------+
                         |                 |     +----------+
         current load    | Power scheduler |<----+ cpufreq  |
              +--------->| sched/power.c   +---->| driver   |
              |          |                 |     +----------+
              |          +-------+---------+
              |             ^    |
        +-----+---------+   |    |
        |               |   |    | available capacity
        | Scheduler     |<--+----+ (e.g. cpu_power)
        | sched/fair.c  |   |
        |               +--+|
        +---------------+  ||
           ^               ||
           |               v|
 +---------+--------+  +----------+
 | task load metric |  | cpuidle  |
 | arch/*           |  | driver   |
 +------------------+  +----------+

Linux Kernel News – June 2013 [by Shuah Khan in Linux Journal , July 9, 2013]

As always the Linux kernel community has been busy moving the Linux mainline to another finish line and the stable and extended releases to the next bump in their revisions to fix security and bug fixes. It is a steady and methodical evolution process which is intriguing to follow. Here is my take on the happenings in the Linux kernel world during June 2013.
Mainline Release (Linus’s tree) News
Linus Torvalds released Linux 3.10. You can read what Linus Torvalds had to say about this release in his release announcement athttp://lkml.indiana.edu/hypermail/linux/kernel/1306.3/04336.html
Two notable features in this release are improved SSD caching and better Radeon graphics driver Power Management.

Power efficient scheduling design
Ingo Molnar (Red Hat, x86 maintainer), Morten Rasmussen (ARM, power mgmt.), Priti Murthy (IBM, scheduler), Rafael Wysocki (Intel, Linux PM, and Linux ACPI maintainer) and Arjan van de Ven discussed the proposed power-aware or power-efficient scheduler design and what’s the best way to integrate it into the kernel.
Power management and the ability to balance performance and power efficiency is important and complex. It is not just about scheduler or cpus. It spans I/O devices that transition into lower-power states and how costly it is to bring them back to fully active state when needed. There is latency involved in these transitions. As always, Linux developers reach consensus to solve complex problems such as these and come up with path to get to the goal taking small steps towards that goal. Here is another example of that process at work.

Power-efficient scheduler work has been active for a few months now. Several RFC patches have been floated and discussed. This work is being pursued very actively in x86 space by IBM and in ARM space by ARM. The premise is that, if scheduler could pack tasks on a few cores and keep these cores fully utilized and, transition other cores to low power states, when the scheduling goal is power savings over performance. In other words, instead of keeping all the cores active, scheduler could consolidate tasks on a few cores and transition other cores to low-power states for better power efficiency.

It is easier said than done. Scheduler is at a higher level and would not be the best judge of making decisions on transitioning CPUs to idle states and deciding on the ideal frequency they should be running at. These decisions are better left to platform drivers that have the specific knowledge of the platform and architecture as they are complex and very hardware specific. In other words, power aware scheduler tuned to run well on x86 platforms will not work as well or could fail miserably on ARM platforms.
Scheduler has to accomplish load balancing as well as power balancing in a way to meet performance and power goals and do it well on all platforms. A generic scheduler doesn’t have to control and drive low-power state decisions on a platform. However, the goal of power-efficient scheduler is to set higher level abstracted policies that would work on all platforms. After a long and productive discussion, there is a consensus and here is the summary:
  • A new kernel configuration option CONFIG_SCHED_POWER to enable/disable the power scheduler feature. Power scheduler is totally inactive, when CONFIG_SCHED_POWER is disabled, and fully active when CONFIG_SCHED_POWER is enabled. The important goal is evolving the power scheduler feature without disrupting and destabilizing the current scheduler.
  • Work on a generic power scheduler with hardware and platform abstractions that will work well on big little ARM, x86, and other platforms. Avoid platform specific power policies that could lead to duplication of functionality in platform specific power drivers.

Please check the Linux Foundation site for presentations made at the Linux Collaboration Summit back in April 2013 on this topic. Here is the link to Jonathan Corbet’s blog on this topic.
http://www.linux.com/news/featured-blogs/200-libby-clark/715486-boosting…

From: big.LITTLE Software Update [by George Grey on Linaro Blog, July 10, 2013]

There are also two software models now available, that ARM and Linaro have developed to enable control of workloads, performance, and power management on big.LITTLE SoCs.

The first is the IKS [In Kernel Switcher, also known as CPU Migration]software, developed by Linaro, that treats each pair of Cortex-A7 and Cortex-A15 cores as a single ‘virtual’ core. On a multicore SoC each pair is treated as 1 of n virtual symmetric cores by the Linux kernel.

Core Software Configuration for IKS (4+4)

image

Using existing mechanisms in the Linux kernel for each pair the cpufreq driver controls whether the Cortex-A7 is active (for low power) or the Cortex-A15 is active (for maximum performance). Overall maximum performance and throughput on a 4+4 core SoC is from 4 Cortex-A15s. The key attribute of IKS is that it relies on existing well-understood mechanisms in the Linux kernel and it is easy to implement, test and characterize in a production environment.

The second is the Global Task Scheduling (GTS) [also known as big.LITTLE MP or Heterogeneous Multi-Processing (HMP)] software developed (and now named) by ARM. This is known in Linaro as big.LITTLE MP. Using GTS all of the big and LITTLE cores are available to the Linux kernel for scheduling tasks. We are very proud that Linaro has contributed to ARM’s development of the GTS software, and that it is now publicly available in Linaro builds. ARM and Linaro recommend GTS for new products, and Linaro members are actively planning product deployments using this solution.

Core Software Configuration for GTS (4+4)

image

The big.LITTLE MP patch set creates a list of Cortex-A15 and Cortex-A7 cores that is used to pick the target core for a particular task. Then, using runnable load average statistics, the Linux scheduler is modified to track the average load of each task, and to migrate tasks to the best core. High intensity tasks are migrated to the Cortex-A15 core(s) and are also marked as high intensity tasks for more efficient future allocations. Low intensity tasks remain resident on the Cortex-A7 core(s).

IKS and GTS are now publicly available in Linaro monthly engineering releases for the ARM TC2 Versatile Express hardware, and in Linaro’s interim Long Term Supported Kernel (LSK) build. Both will also be incorporated into the first full Linaro LSK, which will be based on the next Linux Foundation, Greg Kroah-Hartman designated, Long Term Supported (LTS) kernel.

Until GTS functionality is fully upstream, ARM is supporting the big.LITTLE MP patch set for its licensees, leveraging Linaro’s public monthly and Linaro LSK builds, so that it is available to all ARM licensees for product integration and deployment. Linaro also expect to provide a topic branch for the latest work available on the upstream GTS implementation for interested developers.

ARM and Linaro now recommend product development and deployment to be based on the GTS solution. However, there are some cases where hardware limitations or a requirement for the traditional Linux scheduler (for example in some embedded applications) may lead to IKS still being required.

Future Work

Power management software in Linaro is worked on by the Power Management Working Group. Other activities within the Group will enable additional power savings on ARM multi-core devices. One current project worth highlighting is the work being done by Vincent Guittot on small-task packing. Normally the Linux kernel will spread running tasks over all the available CPU cores. On a handset in standby, or even when being used with low activity, there may be a number of housekeeping and other small tasks that run in the background or relatively infrequently and therefore keep cores active unnecessarily. If “small” tasks can be migrated to one core, then the other cores could be made idle or even turned off completely, potentially resulting in significant power savings. This feature is expected to offer improved power management to systems based on symmetric multi-core SoCs (for example dual or quad-core Cortex-A7 or Cortex-A15 parts), as well as big.LITTLE SoCs.

While the current big.LITTLE efforts are focused on Cortex-A15 and Cortex-A7, the techniques being implemented today for 32-bit systems are already being run on 64-bit models. We therefore expect to see the GTS software running on 64-bit Cortex-A57 and Cortex-A53 based big.LITTLE SoCs as soon as they become available.

Real Life Results

ARM has published further information on big.LITTLE configurations and performance in a blog entry here [Ten Things to Know About big.LITTLE [Brian Jeff on SoC Design blog of ARM, June 18, 2013]].

The first commercial products based on big.LITTLE are certain international versions of the latest Galaxy S4 phone from Linaro member, Samsung. Samsung-LSI provide an ‘Octa-core’ 4+4 big.LITTLE chip for this phone. As has been publicly noted, the current generation of hardware cannot yet take full advantage of the IKS or the GTS designs because the hardware power-saving core switching feature is implemented on a cluster basis rather than on a per-core or a per-pair basis. Even so, the first big.LITTLE implementation produces performance and power consumption on a par with the latest Qualcomm multi-core Snapdragon processor according to reviews from Engadget, PocketNow and others. Often first implementations of new technology never see the light of day – it is a tribute to Samsung’s engineers that the Exynos 5 is already seeing the Cortex-A15 level of performance with the power saving of the Cortex-A7s in a mass market handset in the very first big.LITTLE iteration.

We look forward to seeing what improvements full use of GTS will bring when used on future production devices from Samsung and others.

More information: Power Management with big.LITTLE: A technical overview [by Steven Willis in SoC blog of ARM, June 20, 2013]

Why all this sudden attention on the Linux Scheduler? [LCE13, Linaro Connect Europe]

12:00 PM – 13:00 PM on Monday, Jul 8, 2013 (IST)

Description

The Linux scheduler is getting a lot of attention in the ARM ecosystem these days. Come to this discussion to find out why.

Several people working on the scheduler or interested in changes to the scheduler will be invited to talk about their requirements, what is the state of their work, who will benefit from it, etc.

Video record of the Why all this sudden attention on the Linux Scheduler? dscussion

Minutes of the above discussion

Determinism: problems
———————
* Preemption: interrupts, locking
* Latency
* Scheduling overhead
* Realtime processing
Most of the requirements are coming from LEG/LNG.
Solutions:
    – PREEMPT_RT
    – Adaptive NO_HZ (merged in 3.10)
        Came out of high-performance computing. When there is just one
        task, the scheduler is switched off for that CPU. Results in
        zero scheduler overhead. When the only task finishes – the CPU
        will get into scheduling/idle again.
        There is still once-per-second tick for scheduling. There
        is a patch removing that last remaining bit to make it fully
        tickless.
        We’re not sure yet if all the possible limitations are found –
        there still might be some scheduler overhead left.
        If interrupt handling is offloaded to other cores, caching
        related issues will still affect performance (e.g. serving IO
        interrupts for the task on a different core will require the
        dedicated core to cache the date once again).
    – Deadline
Physical process isolation: none addresses
    – Needed for KVM.
Temporal isolation: all three (with some limitations)
No scheduling overhead: ADAPTIVE NO_HZ only.
Firm/Hard Real-time PREEMPT_RT only
Complexity:
    high for PREEMPT_RT
    low for the rest
Requirements:
all of the above
Power efficiency: history
————————-
* sched-mc (got removed)
* big.LITTLE MP patches implementing GTS (ARM)
* Packing Small Tasks (Linaro/ARM)
    Pack all small background tasks on as little number of small cores
    as possible to conserve power.
    Intel approach does not care about which core is selected as the
    best one (Turbo Mode is effectively converting the core into a BIG
    core, while all the other cores are becoming little ones). Task
    migration is expensive – this approach helps avoiding it.
* Power aware scheduling (Intel)
    Discussions were lasting for a while and then Ingo Molnar requested
    an integral solution (not a set of independent bits).
    He made a good point. What we have an SMP legacy implementation.
    Are we starting from scratch because of that?
    It is going to be a significal change. We need to re-think as it’s
    not SMP case anymore. b.L is not a new architecture – Intel already
    does that but differently.
    The task is to find the most efficient way of performing the work
    needed. The best place to make those decisions is the scheduler.
    Power officiency – proposal (from ARM)
    ————————————–
    Separate process and power scheduler (ARM). This is the first step
    to get to the fully integral scheduler in the future. Helps fighting
    with the complexity at hand. In this case there are certain
    limitations – one of the schedulers will be leading while the second
    one will be limited.
    That doesn’t work well for Intel CPUs (no pre-configured small/BIG
    cores).
    Issues:
    – Topology
        Missing:
        – Frequency domains, which CPUs are affected. That would be
          useful for the scheduler.
    – Idle + DVFS
        Missing:
        – information about the cost of using a certain core at certain
          DVFS operation point to perform a certain amount of work.
    – Thermal
        The idea is to keep an eye on the temperature trend to avoid
        cases when whole cores are needed to be temporarily shut down to
        cool them down.
        GPU contribution into the thermal budget should also be
        considered
.
    Trying to control DVFS from the scheduler. Patches are expected very
    soon
.
Q: How much of the improvements are we looking for (power wise)?
A: Something that will get upstream. 😀

Linux 3.10 [by Linus Torvalds on Linux kernel mailing list, June 30, 2013]
Linux kernel 3.10 arrives with ARM big.LITTLE support [Engadget, July 1, 2013]

Thanks to Linus Torvalds’ figurative stroke of the pen, the Linux kernel 3.10 is now final — paving the way for its inclusion in a bevy of Linux distributions, and even offshoots such as Android and Chrome OS. The fresh kernel brings a good number of changes, such as timerless multitasking, a new caching implementation and support for the ARM big.LITTLE architecture. In simplistic terms, the new multitasking method should help improve performance and latency by firing the system timer only once per second — rather than 1,000 times — when tasks are running. Meanwhile, users with both traditional hard drives and SSDs will find performance benefits from bcache, which brings writeback caching and a filesystem agnostic approach to leveraging the SSD for caching operations. Also of significance, Linux kernel 3.10 enhances ARM supportby including the big.LITTLE architecture, which combines multiple cores of different types — commonly the Cortex-A7 and Cortex-A15 — that focus on either power savings or performance. The full list of improvements is rather lengthy, but if you feel like nerding out with the changelog, just grab a caffeinated beverage and get to it.

Linaro 13.06 Released! [by Amber Graner on Linaro Blog, June 27, 2013]

The Linaro 13.06 release is now available for download!

It’s been a very active cycle for the Builds and Baselines team, reporting that the Continuous Integration (CI) loop for the Linaro Stable Kernel (LSK) Android proof of concept which is based on 3.9.6 kernel version was set up and includes the big.LITTLE IKS and MP patches (also called beta patchset). Support for Kernel CI loop with Android filesystem was added to android-build and CI loop was set up to track the ARM Landing Team (LT) integration tree. The HiSilicon member build with complete CI loop was set up and now tracks the LT kernel tree.

ARM Cortex-A12 CPU cores and Mali-T622 GPU cores with Process Optimization Packs (POPs), plus Mali-V500 video block for mid-range mobile devices of the end of 2014

in order to cover (very competitively) the hole existing in ARM-based SoCs so far:

Arm unveiled the Cortex A12 processor during a news conference at Computex in Taipei on June 3, 2013.

AnandTech’s judgement about the Cortex-A12 announcement:

… The Cortex A9 is too slow to compete with the likes of Intel’s Atom and Qualcomm’s Krait 200/300 based SoCs. The Cortex A15 on the other hand outperforms both of those solutions, but at considerably higher power and die area requirements. … The Cortex A15 island in Samsung’s Exynos 5 Octa occupies 5x the die area as the A7 island, and consumes nearly 6x the power. In exchange for 5x the area and 6x the performance, the Cortex A15 offers under 4x the performance. It’s not exactly an area or power efficient solution, but a great option for anyone looking to push the performance envelope. Today, ARM is addressing that hole with the Cortex A12. …
Asked at a Taipei news conference about the future of Intel’s x86 architecture, rival Arm said it still sees life in the platform.

AnandTech’s judgement about Mali-T622 and Mali-V500 announcements:

… The Mali-T622 is a 2-core implementation of the 2nd generation Mali-T600 GPU architecture that we first learned about with the 8-core T628. Each shader core features two ALUs, an LSU and a texture unit. … On the video front, the Mali-V500 video encode/decode block is a multi-core engine used for all video acceleration. The V500 allegedly supports up to 100Mbps High Profile H.264, although details are scarce on more specifics. ARM claims support for up to 120 fps 4K video decode with an 8-core V500 implementation. Mali-V500 also features a protected video path, necessary for gaining content owner support for high-bitrate/high-resolution video decode. The V500 also supports ARM’s Frame Buffer Compression (AFBC), a lossless compression algorithm that can supposedly reduce memory bandwidth traffic by up to 50%. There’s presently no frame buffer compression in Mali GPUs today, but ARM expects to eventually roll AFBC out to Mali GPUs as well.

Announcement information from ARM:

image
POP IP for the Cortex-A12 processor core 
– The only implementation solution that is co-developed along with the processor itself
– The processor RTL and the POP implementation feed off each other and are thoroughly co-optimized
– Lower the risk to end customers and those designers starting from scratch with a new processor core
– Save months of effort optimizing the implementation
POP IP for Mali-T622 GPU core 
– Eliminates the iterative guess work required to find the most optimal implementation
– Enables best-in-class PPA and frames per second metrics coupled with highly flexible implementation
More information: POP IP for the Cortex-A12 Processor: Enabling the Next Billion Smartphones [June 3, 2013]

image
New ground-up design for mid-range mobile
– OoO, Dual-Issue, 11 stage dynamic length pipeline
– Tightly integrated, high-performance NEON and FPU units
Perfectly balanced design for best efficiency
– Highly optimized L1 and L2 memory sub-system
– Ideal for current and upcoming mobile workloads
Flexible interface options to adapt for use-case
– 128-bit AMBA ACE – System coherency with CPUs or GPUs
– Accelerator Coherency Port (ACP) – I/O coherency with DMA
– Peripheral Port – For low-latency peripherals elmiminating DDR traffic congestion
More information: Cortex-A12: Diversification in the Mobile Market – Serving the Mid-Range [June 3, 2013]

image
Smallest GPU Compute solution in the market
– Renderscript Compute and OpenCL 1.1 Full Profile
50% energy-efficiency improvements over Mali-T600 series
Richest user experience with OpenGL ES 3.0
More information: Mali-T622 – Bringing Full Profile GPU Compute to mid-range devices [June 3, 2013]

image 
1080p60 HD encode/decode
Optimized for lowest cost and power
– AFBC gives 50% lower memory bandwidth
TrustZone secure video path
– Premium content protection
More information: A new branch for the Mali family tree: Mali Video, featuring the Mali-V500  [June 3, 2013]

ARM Targets 580 Million Mid-Range Mobile Devices with New Suite of IP [press release, June 3, 2013]

News Highlights:

  • Faster time to market and less design risk with suite of IP including: 
    ARM Cortex-A12 processor, Mali-T622 GPU, Mali-V500 video solution and POP IP technology;
  • 580 million mid-range smartphones and tablets are forecast to be sold in 2015
  • Cortex-A12 processor delivers 40 percent more performance than Cortex-A9 and brings premium features such as virtualization to the mid-range mobile device market; efficiency profile also makes it ideal for DTV and home networking;
  • Cortex-A12 processor brings optimum performance and maximum efficiency of big.LITTLE processing to mid-range smartphones and tablets;
  • Mali-T622 GPU offers an efficient and qualified OpenGL ES 3.0 solution and smallest Full Profile GPU Compute solution, putting even greater compute power into the hands of more mobile users;
  • Mali-V500 video IP solution reduces system bandwidth and power, while enabling the protection of premium video content with TrustZone support.

The essence is that the first Cortex-A12 based SoCs are expected by mid-2014
– for mid-range devices (smartphones and tablets) in the $200 … $350 price range by late 2014 to early 2015  
– with Cortex-A7/A15 architectural compatibity, thus in big.LITTLE configurations with either core, supporting 40-bit addressing (up to 1 TB) and virtualization
– plus providing the highest efficiency in pairing with Cortex-A7 core
– as the follow-up with +40% performance to the current SoCs for mid-range devices based on Cortex-A9 SoCs

The SoC ramp-up of about one year or so is compared to not less than two years ramp-up for Cortex-A9 based SoCs. This is the result of significant progress with Process Optimization Pack technology of ARM which was first time developed along with the processor and GPU cores themselves. It is available now for TSMC 28HPM process technology for lead partners. Six of them are already starting their SoC design. Moreover it will also be available at GLOBALFOUNDRIES 28-SLP HKMG process technology in Q4 2013. So it is also first time as such complete sourcing from two foundries will be available for SoC vendors so early on. GLOBALFOUNDRIES is even going to achieve up to 70 percent higher performance in comparison to a Cortex-A9 processor core using 40nm process technology. Competition between those 2 foundries will understandably be very strong as the 2015 mid-range smartphone and tablet market is expected to be not less than 580 million units.

In comparison the Cortex-A9 core was announced in October 2007 and released in 2008
now contributes to approximately one-third of all smartphone shipments worldwide
real development opportunities began in H2 2009 with possibility to go even against Intel Atom (source: Computex 2009 – Warren East Presentation [ARM Holdings, June 1, 2009]):
image
with improving Cortex-A9 performance on 45nm process achieved through:
– 56% improvement from processor and physical IP optimisations
– 44% improvement from other techniques
The first SoC products based on 45nm technology came in 2011, namely:
NXP PNX 847x/8x/9x set-top box SoCs sampling in January 2010. However a month later the business related to these products was sold to Trident Microsystems (see the PNX8490/PNX8491 datasheet of February 2010) and as Trident had experienced continuing operating losses it filed for bankruptcy in January 2012. Its set-top box SoC business had been taken over by Entropic Communications, Inc. in April 2012. Although only the PNX8475 is currently offered by Entropic the original Cortex-A9 related SoC know-how is flourishing quite well there (see also: 1, 2, 3 and 4).
Samsung Orion application processor, later renamed into Samsung Exynos 4210 then further into Exynos 4 Dual, announced in September 2010 for sampling in Q4 2010 and mass production in H1 2011. It first came out with the Samsung Galaxy S II smartphone announced in February 2011 for May 2011 delivery. Other Samsung smartphone and tablet products then followed.
Texas Instruments OMAP 4430 and OMAP 4440 (later renamed OMAP 4460) application processors announced in February 2009 for sampling in H2 2009 and expected production by the second half of 2010, but actually debuted a year later in February 2010 with sampling available and expected production in H2 2010. The first product based on OMAP 4430 was the BlackBerry PlayBook tablet announced in September 2010 for early 2011 availability but becoming available in June 2011 only. Smartphone products from Motorola (a lot, also a few tablets) and LG (a few) followed that, as well as a number of tablet products from Archos and most notably the Kindle Fire from Amazon, and the Nook from Barnes & Noble.

ARM is representing and projecting the evolution of the market since then as follows:image
More information about that was provided in:
Cortex-A12: Diversification in the Mobile Market – Serving the Mid-Range [ARM Smart Connected Devices blog, June 3, 2013]

Mobile devices have become indispensable in North America, Europe, and much of Asia, and are becoming the primary compute platforms for people in emerging markets. We are entering a new era of computing, the post-PC era. ARM® technology has been at the heart of the mobile revolution for over twenty years and continues to be the bedrock of all innovation and change in this space.
Mobile devices, such as smartphones and tablets, are connecting billions of people. In 2013, we are expecting:
– Over 1 billion smartphones forecasted to ship*
– Smartphones for <$50 and Tablets >$800
– Tablets out-ship notebook PCs
What becomes clear when looking at mobile devices is that we are seeing segmentation into multiple markets, which is an opportunity for growth for ARM partners:
– Premium devices: Price range > $400
– Mid-range devices: Price range between > $200 and < $350
– Entry-level devices: Price range up to $150image
Source: Mixture of ARM and Gartner Estimates
Premium smartphones and tablets receive a great deal of attention, but it is the entry-level and mid-range markets are expected to grow the fastest over the next years. ARM delivered the Cortex®-A7 processorin the fourth quarter of 2011, and it is now shipping in large volumes in low-cost, quad-core devices. It will be followed by the Cortex-A53 processor, which is soon to be released to lead partners. Both are high-efficiency processors, that are efficient by simple in-order eight stage pipelines which are highly efficient and tuned to deliver very good performance for their size. In the mid-range mobile device market, the industry had tremendous success with devices based on the higher-performance Cortex-A9 processor, which uses a partially out-of-order, nine stage pipeline to achieve high performance tuned to the power constraints of smartphones. The Cortex-A9 processor was released in 2008 and now contributes to approximately one-third of all smartphone shipments worldwide.
The market segmentation is driving the diversification in mobile and resulting in many different requirements needed to achieve the highest performance and lowest power within a sustained thermal envelope. These requirements make it mandatory to provide the functionality previously available only in premium devices, but within the power budgets of mid-range devices. Looking at how to serve those markets, it is clear that one size does not fit all anymore.
Today ARM is introducing the Cortex-A12 processor, the highest performance mid-range CPU that is specifically designed for the next-generation mid-range mobile market. The Cortex-A12 processor brings its own mix of high performance and energy efficiency to 2014 SoC designs: more performance than the Cortex-A9 processor with the same mobile-tuned power efficiency. The Cortex-A12 processor is designed to deliver the best mobile experience:
– Highest performance at lowest power consumption and cost
– Highest efficiency within mid-range thermal envelopes, i.e. achieve highest performance at uncompromised area
– Premium feature set in mid-range mobile
The Cortex-A12 processor is the successor to the Cortex-A9 processor and increases single-thread performance by 40 percent, while matching the best-in-class energy efficiency. Measured in 28nm, the Cortex-A12 processor is about 30 percent smaller in area compared to the Cortex-A9 processor in 40nm technology using the same configuration. Additionally, the Cortex-A12 processor brings today’s premium smartphone features into the mid-range, allowing new use cases and great mobile experiences. Some key added features include:
big.LITTLE™ processing enables the extension of the dynamic range of the Cortex-A12 processor with the addition of the Cortex-A7 processor
Virtualization and TrustZone® security support enabling new use cases like BYOD (bring your own device)
– 1TB addressable memory, providing close to no boundaries on memory space
The Cortex-A12 processor extends the performance capability in mid-range devices without sacrificing energy efficiency when combined with the Cortex-A7 processor as a big.LITTLE CPU subsystem. big.LITTLE processing provides a highly efficient, high-performance processing solution that can scale to many different use cases. The first iterations of big.LITTLE processing featured the Cortex-A15 and Cortex-A7 processors for high-end solutions. Now, the Cortex-A12 processor is bringing big.LITTLE processing to increase the dynamic range of the mid-range by enabling SoC designers to push the Cortex-A12 processor further while using the Cortex-A7 processor to reduce power well below levels of the Cortex-A9 Processor. This results in an ideal combination of compute resource for efficient workload distribution, running lightweight tasks on the Cortex-A7 processor and high-performance tasks on the Cortex-A12 processor. Early results show up to 2x increased efficiency.
Even though it is designed for mid-range smartphone and tablet devices, the Cortex-A12 processor leads with an excellent efficiency profile, making it an ideal fit for other use cases like home networking, residential gateway and auto infotainment systems.
ARM has also designed the Cortex-A12 processor to work efficiently with a complimentary family of high performance, low power ARM CoreLink™ System IP components:
image 
The system diagram shown above illustrates the system IP components that will typically support the Cortex-A12 processor in a mobile SoC. To deliver effortless 1080p30 graphics with 1080p encode/decode the system also features a Mali™-T622 GPU supporting OpenGL/ES 3.0 and a Mali-V500 video accelerator.
The CoreLink CCI-400 cache coherent interconnect provides an IO coherent channel with Mali and opens up a number of exciting possibilities for offload and acceleration of tasks. When combined with a Cortex-A7 processor (not shown) on the ACE port, CCI-400 also allows big.LITTLE operation with full L2 cache coherency between the Cortex-A12 and Cortex-A7 processors. Efficient voltage scaling and power management is enabled with the CoreLink ADB-400 enabling efficient DVFS control of the Cortex-A12 processor.
CoreLink MMU-500 provides a hardware accelerated, common memory view for all SoC components and minimizes software overhead for virtual machines to get on with other system management functions. In this system, the Cortex-A12 processor also enjoys a secure, optimized path to memory to further enhance its market-leading performance with the aid of CoreLink TZC-400 TrustZone address space controller and DMC solution. All interconnect components and the ARM DMC guarantee bandwidth and latency requirements by utilizing in-built dynamic QoS mechanisms.
ARM POP™ IP supports the physical implementation of the Cortex-A12 processor and Mali GPU to enable best power, performance, and area so critical to success in the highly competitive mid-range SoC market. ARM CoreSight™ debug and trace on-chip hardware, coupled with the ARM DS-5™ software development toolchain, enable the debug of random, time-related software bugs, and the non-intrusive analysis of critical areas of software. The ARM Development Studio 5 (DS-5TM) toolchain also makes use of performance counters embedded in the processor, graphics processor and interconnect to enable system-wide optimization.
The ARM Cortex-A12 processor is the highest-performance, mid-range CPU. It is specifically designed for the mid-range mobile market, and is broadly supported by a range of other ARM technology IP including ARM system IP, POP IP and development tools to enable ARM Powered® solutions that contribute to the very best user experience in terms of responsiveness and battery life. At the same time, it allows ARM partners to accelerate time to market for mid-range SoCs, while freeing development time to add their own differentiation. The Cortex-A12 is a highly tuned processor that will bring the performance of high-end mobile devices into mid-range smartphone and tablets, as well as into other great market opportunities we haven’t even considered.
* Source: Bank of America
Related Blogs:

ARM and GLOBALFOUNDRIES to Optimize Next-Generation ARM Mobile Processors for 28nm-SLP Process Technology [press release, June 3, 2013]

New ARM POP technology provides core-hardening acceleration for Cortex-A12 and Cortex-A7 processors
Milpitas, Calif. and Cambridge, UK, June 3, 2013 – In conjunction with the launch of the ARM®  Cortex®-A12 processor, ARM and GLOBALFOUNDRIES today announced new power, performance and cost-optimized POP™ technology offerings for the ARM Cortex-A12 and Cortex-A7 processors for GLOBALFOUNDRIES 28nm-SLP High-K Metal Gate (HKMG) process technology. The Cortex-A12 processor was introduced by ARM today as part of a suite of IP targeting the rapidly growing market for mid-range mobile devices.
The companies will combine ARM’s next-generation mobile processor and POP IP with GLOBALFOUNDRIES 28nm-SLP HKMG process solution, enabling a new level of system performance and power efficiency with the optimum economics necessary to serve the mid-range mobile device market.   The new initiative builds on the existing robust ARM Artisan® physical IP platform and POP IP for the Cortex-A9 processor already available on GLOBALFOUNDRIES 28nm-SLP, signifying another milestone in the multi-year collaboration between ARM and GLOBALFOUNDRIES.
Central to this increase in functionality for mid-range mobile devices is the new ARM Cortex-A12 processor. The Cortex-A12 processor provides a 40 percent performance uplift and direct upgrade path from the incredibly successful Cortex-A9 processor, while matching the energy efficiency of its predecessor. The Cortex-A12 processor provides best-in-class efficiency as a standalone solution, but additionally supports the innovative big.LITTLE™ processing technology with the Cortex-A7 processor, bringing this energy-efficient technology to the mid-range.  GLOBALFOUNDRIES 28nm-SLP process technology and associated ARM POP IP for the Cortex-A12 processor enables up to 70 percent higher performance (measured single-thread performance) and up to 2x better power efficiency in comparison to a Cortex-A9 processor using 40nm process technology. Designers can achieve even higher performance by trading off for lower power efficiency, depending on their application needs. Click here for more information on the Cortex-A12 processor.
The newest POP technology enables customers to accelerate core-hardening of Cortex-A12 and Cortex-A7 processors on GLOBALFOUNDRIES 28nm-SLP HKMG process. POP IP for Cortex processors has successfully enabled ARM-based SoCs with more than 30 different licenses since being introduced over three years ago. POP IP is composed of three elements necessary to achieve an optimized ARM processor implementation: core-specific tuned Artisan physical IP logic libraries and memory instances, comprehensive benchmarking reports, and implementation knowledge that detail the methodology used to achieve the result, to enable the end customer to achieve the same implementation quickly and at low risk.
“With 580 million mid-range smartphones and tablets forecast to be sold in 2015[i], consumers are increasingly looking for the right combination of performance, low power and cost effectiveness,” said Dr. Dipesh Patel, executive vice president and general manager, Physical IP Division at ARM. “With the Cortex-A12 processor and suite of IP announced today, ARM is delivering an optimized system solution leveraging the most innovative technologies available for this market. The POP IP solution on GLOBALFOUNDRIES 28nm-SLP helps designers balance the performance, power and cost tradeoffs to achieve their targets for this growing market.”
GLOBALFOUNDRIES 28nm-SLP technology is ideally suited for the next generation of smart mobile devices, enabling designs with faster processing speeds, smaller feature sizes, lower standby power and longer battery life. The technology is based on GLOBALFOUNDRIES’ “Gate First” approach to High-K Metal Gate (HKMG), which has been in volume production for more than two years. The technology offers a combination of performance, power efficiency and cost that is ideally suited for the mid-range mobile market.
“GLOBALFOUNDRIES is committed to a deep relationship with ARM to enable best-in-class solutions for our mutual customers. Our collaboration on the ARM Cortex-A12 processor implementation is a direct result of this focus and collaboration,” said Mike Noonen, executive vice president of Marketing, Sales, Design and Quality at GLOBALFOUNDRIES.
GLOBALFOUNDRIES’ next-generation 14nm-XM FinFET technology is expected to bring another dimension of enhanced power, performance and area for ARM mobile processors. A Cortex-A9 processor implemented on 14nm-XM technology, using 9-track libraries, is projected to enable a greater than 60 percent increase in frequency at constant power, or a decrease of more than 60 percent in power consumption at constant performance, when compared to implementation on 28nm-SLP technology using 12-track libraries. Similar results are expected for Cortex-A12 processor implementations. Click here for more details on GLOBALFOUNDRIES’ 14nm-XM FinFet technology.
For further discussions about GLOBALFOUNDRIES process technologies or ARM IP offerings please visit the companies’ respective exhibits at the Design Automation Conference (DAC), June 3-5, 2013 in Austin, Texas. ARM is located in booth 931, and GLOBALFOUNDRIES can be found at booth 1314.

The state of big.LITTLE processing

Complementary post reminder: Eight-core MT6592 for superphones and big.LITTLE MT8135 for tablets implemented in 28nm HKMG are coming from MediaTek to further disrupt the operations of Qualcomm and Samsung [‘Experiencing the Cloud’, July 20, 2013] from which the following excerpts I will include here as the ones directly related to the content given here as well:
There are also two software models now available, that ARM and Linaro have developed to enable control of workloads, performance, and power management on big.LITTLE SoCs. … The second is the Global Task Scheduling (GTS) [also known as big.LITTLE MP] software developed (and now named) by ARM.
Until GTS functionality is fully upstream, ARM is supporting the big.LITTLE MP patch set for its licensees, leveraging Linaro’s public monthly and Linaro LSK builds, so that it is available to all ARM licensees for product integration and deployment. Linaro also expect to provide a topic branch for the latest work available on the upstream GTS implementation for interested developers.
ARM and Linaro now recommend product development and deployment to be based on the GTS solution. However, there are some cases where hardware limitations or a requirement for the traditional Linux scheduler (for example in some embedded applications) may lead to IKS still being required.
Real Life Results
ARM has published further information on big.LITTLE configurations and performance in a blog entry here [Ten Things to Know About big.LITTLE [Brian Jeff on SoC Design blog of ARM, June 18, 2013]].
The first commercial products based on big.LITTLE are certain international versions of the latest Galaxy S4 phone from Linaro member, Samsung. Samsung-LSI provide an ‘Octa-core’ 4+4 big.LITTLE chip for this phone. As has been publicly noted, the current generation of hardware cannot yet take full advantage of the IKS or the GTS designs because the hardware power-saving core switching feature is implemented on a cluster basis rather than on a per-core or a per-pair basis. …
End of the complementary post reminder

The first big.LITTLE device (Samsung Galaxy S4, Exynos 5 Octa version) was announced mid-March and hopefully will be available from end of April at the earliest, and in a few countries only (US is one of them). The price is also way too high: $1,379 unlocked on Amazon. 70% of the first 10M S4 smartphones will come with the quad-core Snapdragon S600 instead (seemingly for as low price as $800). The reason is: Samsung Semiconductor is just entering 28nm production with this SoC so it is “scheduled for mass-production in the second quarter of 2013”. While we should therefore wait probably till Q3 for larger scale availability it is already time to examine both the product and the form of big.LITTLE processing delivered with it:

Introducing Samsung GALAXY S 4 [Samsung Mobile Press, March 14, 2013]

Developed to redefine the way we live, the GALAXY S 4 makes every moment of our life meaningful. It understands the value of relationships, enables true connections with friends and family, and believes in the importance of effortless experience.
Highly crafted design with a larger screen and battery, thin bezel, housed in a light 130g and slim 7.9mm chassis. The new Samsung GALAXY S 4 is slimmer, yet stronger.
The GALAXY S 4 gets you closer to what matters in life, and brings your world together.
For a richer, simpler and fuller life.
To find out more, click here http://www.samsung.com/galaxys4/

Samsung Introduces the GALAXY S 4 – A Life Companion for a richer, simpler and fuller life [Samsung press release, March 14, 2013] in US: Pre Order with Octa-Core … Will Ship on Date 30 April By Fedex

… Samsung GALAXY S 4 will be available from Q2 globally [in UK: from April 26th but the Qualcomm Quad-Core; in US: Pre Order with Octa-Core … Will Ship on Date 30 April By Fedex] including US, partnering with AT&T, Sprint, T-Mobile, Verizon Wireless, as well as US Cellular and Cricket. In Europe, Samsung GALAXY S 4 is partnering with global mobile operators such as Deutsche Telecom, EE, H3G, Orange, Telenor, Telia Sonera, Telefonica, and Vodafone. …

AP

  • 1.9 GHz [Qualcomm] Quad-Core Processor / 1.6 GHz [Samsung] Octa-Core Processor
  • The selection of AP will be differed by markets.

70% of first Galaxy S4s to come with Snapdragon 600 CPU. Samsung LSI couldn’t make enough Exynos 5 Octas in time [Unwired.com, March 25, 2013]

70% of the first 10 million Samsung Galaxy S4 production batch will come with Qualcomm Snapdragon 600 CPU, instead of its own Exynos 5 Octa, Korean ETNEws reports.
Samsung’s LSI division, responsible for the next generation Exynos CPU, failed to iron out the production and performance issues to have enough chips in time for Galaxy S4 launch. Couple of weeks ago Samsung announced that Exynos 5 Octa applications processor is scheduled for mass production only in Q2 2013. Which is too late for the huge volumes of Galaxy S4 shipments that will start in late April.
Last year Samsung already faced production problems with Galaxy S3 and lost a lot of sales in early summer because of it. This year, Sammy doubled the initial sales forecasts for the new flagship and wants to sell 40 million of them in the first three months. So instead of risking the chip supply shortages, they are now turning to Qualcomm for Snapdragon 600 CPU, which was initially slated to go mostly to U.S. versions of SGS4.
Taking a step back to fix the production and performances issues of one of the most important parts in your flagship device, is a smart thing to do. If you launch your new top of the line phone with serious quality issues, the initial bad press can be fatal to your plans to sell 100 million them over the product lifecycle.
Going with tried and true chip like Snapdragon 600, that you know will perform as it should, is the best way for Samsung for now. Especially since most of the users won’t notice the difference and won’t care anyway.

Samsung Announces the Availability of Exynos 5 Octa for New Generation of Mobile Devices [press release, March 15, 2013] (internal name: Exynos 5410)

Samsung Electronics Co., Ltd., a world leader in advanced semiconductor solutions, announced that its new Exynos 5 Octa application processor is scheduled for mass-production in the second quarter of 2013.
As highlighted at CES 2013, the Exynos 5 Octa is the world’s first mobile application processor to implement the new concept of processing architecture, big.LITTLE™, based on the Cortex-A15™ CPU to offer optimal core use. By housing a total of eight cores to draw from—four powerful Cortex-A15™cores for processing-intense tasks along with Cortex-A7™ quad cores for lighter workloads—the Exynos 5 Octa enables mobile devices to achieve maximum performance. This approach offers up to 70 percent energy saving when performing various tasks, compared to using Cortex- A15™cores only.

The newest Exynos processor will be manufactured using Samsung’s latest 28-nanometer (nm) HKMG (High-k Metal Gate) low power process and power-saving design, which increases the power efficiency of the processor by minimizing the static current leakage.

The Samsung Exynos 5 Octa enhances the powerful 3D graphics processing capabilities by more than two-times over the Exynos 4 Quad.
With today’s advanced display technology transitioning toward ever higher and sharper resolutions, the Exynos 5 Octa is powerful enough to drive WQXGA (2560×1600) display, the best crystal-clear resolution currently available for mobile devices, enabling users to enjoy crisper video images on their premium smartphones and tablets.
By adopting e-MMC (embedded multimedia card) 5.0 and USB 3.0 interface for the first time in the industry, the new Exynos application processor boasts fast data transfer speed, a feature that is increasingly required to support advanced processing power on mobile devices so that users can fully experience upgraded mobile computing such as faster booting, web browsing and 3D game loading.
The Samsung Exynos 5 Octa incorporates a full HD 60fps (frame per second) video hardware codec engine for 1080p video recording and play-back, an embedded 13 mega-pixel 30fps image signal processor interface for high-quality camera functionality, and 12.8GB/s memory bandwidth interface that enables Full HD Wifi display.

Samsung Exynos at MWC 2013: Exynos 5 Octa Explained [SamsungExynos YouTube channel, March 14, 2013]

This animated display for the Exynos 5 Octa mobile processor was featured in the Samsung Exynos booth at Mobile World Congress 2013. Samsung’s Exynos 5 Octa is the industry’s first ARM® big.LITTLE™-enabled mobile application processor (AP). The Exynos 5 Octa pairs ultra-efficient ARM® Cortex™-A7 (LITTLE) cores with Cortex™-A15 (big) cores designed for the highest performance. This new system-on-chip (SoC) uses LITTLE cores to handle tasks like emailing, light web search and map navigation and uses the big cores for heavy-duty applications like graphic-intensive gaming. Find out more about how Samsung Exynos is driving the discovery of what’s possible: http://www.samsung.com/global/business/semiconductor/minisite/Exynos/index.html

ARM® Big.LITTLE™ Technology Demo on Exynos 5 Octa Reference Tablet at MWC 2013 [SamsungExynos YouTube channel, March 19, 2013]

ARM’s Eric Gowland demoed ARM® big.LITTLE™ processing technology on an Exynos 5 Octa reference tablet in the ARM booth at Mobile World Congress 2013. Gowland showed us the big.LITTLE-enabled Exynos 5 Octa reference platform running a series of benchmarks for tablet activities like web browsing, video playback, graphics rendering and map navigation. In addition to displaying the CPU migration as the processor switched between activities, the demo showed the relative energy usage throughout, highlighting the extreme power efficiency of big.LITTLE architecture. To learn more about ARM® big.LITTLE™ technology, visit our MWC 2013 webpage: http://www.samsung.com/global/business/semiconductor/minisite/Exynos/index.html You can also find more information on ARM’s specialized microsite:http://thinkbiglittle.com/

Samsung Exynos at MWC 2013: Low-Power High K Metal Gate (HKMG) Process Technology [SamsungExynos YouTube channel, March 14, 2013]

Samsung’s Low-Power High K Metal Gate (HKMG) advanced process technology was featured in this animated display inside the Exynos booth at Mobile World Congress 2013. It demonstrates the progression in process technology from 90nm to 28nm, which has resulted in greater speeds and energy-efficiency in Exynos mobile application processors (APs) developed with the technology. For example, the Exynos 5 Octa can offer up to 70% in energy savings thanks to Samsung’s HKMG process. To learn more about Samsung’s HKMG advanced process technology, visit our website: http://www.samsung.com/global/business/semiconductor/minisite/Exynos/index.html

big.LITTLE Processing [ARM technology site, March 20, 2013] [Linaro internal: IKS [In Kernel Switcher]

ARM big.LITTLE™ processing is an energy saving technology where the highest performance ARM CPUs are combined with the most efficient ARM CPUs in a combined processor subsystem to deliver greater performance at lower power than today’s best-in-class systems. With big.LITTLE processing, software workloads are dynamically and instantly transitioned to the appropriate CPU based on performance needs. This software load balancing is so fast that it is completely seamless to the user. By selecting the optimum processor for each task, big.LITTLE can reduce energy consumption in the processor by 70% or more on light workloads and background tasks, and by 50% for moderately intense work, while still delivering the peak performance of the high performance cores.

More information can be found below or on the Think big.LITTLE microsite

Software

Software can control the allocation of threads of execution to the appropriate core, or in some versions of the software simply move the whole processor context up to big or down to LITTLE based on measured load. There are two software approaches to handling the CPU selection decision, described below. In both software approaches, cache coherence is required to enable the software to quickly move execution from LITTLE to big and from big to LITTLE as appropriate. Cache coherence allows one CPU cluster to look up in the caches of the other CPU cluster, and full hardware cache coherence between the two clusters is key to making big.LITTLE software fast and transparent. Cache coherence can be provided by the ARM CCI-400 cache coherent interconnect or any interconnect that follows the AMBA4 ACE protocol.             

In a big.LITTLE SoCs, the OS kernel dynamically and seamlessly moves tasks between the ‘big’ and ‘LITTLE’ CPUs. In reality this is an extension of the operating system power management software in wide use today on mobile phone SoCs.  

Most OS kernels already support Symmetric Multi-core Processing (SMP) and those techniques can easily be extended to support big.LITTLE systems. There are two main variants of big.LITTLE software scheduling.

big.LITTLE CPU Migration [Linaro internal: IKS (In Kernel Switcher) or simply the big.LITTLE.Switcher project]

In CPU migration a whole workload of a CPU gets move to a differently CPU, once the OS detects it requires more or less performance. This builds on generic techniques in an OS to wake up and put to sleep CPUs in an SMP system. The key extension is around the detection that a CPU is running at maximum frequency while still requesting further performance and thus the workload needs to be moved to a ‘bigger’ CPU. Once the workload has reduced, it can moved back to a ‘smaller’ CPU. 

image

This CPU migration software is available today from Linaro [was released to Linaro partners on Dec 20, 2012 as part of Linaro 12.2 release], and is being actively developed by multiple ARM partners [while Linaro continues to fix bugs on it].

big.LITTLE MP [the final name now is Global Task Scheduling (GTS)]

Task migration (aka big.LITTLE MP [as in the Linaro internal project]) detects a high intensity task and will schedule that onto a ‘big’ CPU. Similarly it will detect a low intensity task and move this back to a ‘LITTLE’ core.

image

The advantage of task migration over CPU migration is that a system can benefit from all its CPU at the same time, if the processing demands are extremely high. For example in a 2x ‘big’ + 2x ‘LITTLE’ system all 4 CPUs can be used at peak demand times, where as CPU migration would only be able to use 2 CPUs. 

[According to Vincent Guittot at Linaro Connect 2013 (March 4 –8) in Hong Kong Linaro will release mid of 2013 the big.LITTLE MP prototype for external testing]

ARM and Linaro have been developing Linux support for both migration models. For more information go to:

Embedded Linux Conference 2013 – In Kernel Switcher [IKS]: A Solution [TheLinuxFoundation YouTube channel, recorded Feb 22, published March 1, 2013], slides are downloadable in PDF format

The Linux Foundation Embedded Linux Conference 2013 In Kernel Switcher: A Solution to Support ARM’s New big.LITTLE Implementation By Matheiu Poirer San Francisco, California The ‘In Kernel Switcher’ (IKS) is a solution developed by Linaro and ARM to support ARM’s new big.LITTLE implementation. It is pairing together an A7 (LITTLE) and an A15 (big) processor into a logical entity that is then presented to the kernel as one CPU. From there the solution is seeking to achieve optimal performance and power consumption by switching between the big or the LITTLE core based on system usage. This session will present the IKS solution. After giving an overview of the big.LITTLE processor we will present the solution itself, how frequencies are masqueraded to the cpufreq core, the steps involved in doing a “switch” between cores and some of the optimisation made to the interactive governor. The session will conclude by presenting the results that we obtained as well as a brief overview of Linaro’s upstreaming plan.

ELC: In-kernel switcher [IKS] for big.LITTLE [LWN.net, Feb 27, 2013]

The ARM big.LITTLE architecture has been the subject of a number of LWN articles (here’s another) and conference talks, as well as a fair amount of code. A number of upcoming systems-on-chip (SoCs) will be using the architecture, so some kind of near-term solution for Linux support is needed. Linaro’s Mathieu Poirier came to the 2013Embedded Linux Conference to describe that interim solution: the in-kernel switcher.
Two kinds of CPUs
Big.LITTLE incorporates architecturally similar CPUs that have different power and performance characteristics. The similarity must consist of a one-to-one mapping between instruction sets on the two CPUs, so that code can “migrate seamlessly”, Poirier said. Identical CPUs are grouped into clusters.
The SoC he has been using for testing consists of three Cortex-A7 CPUs (LITTLE: less performance, less power consumption) in one cluster and two Cortex-A15s (big) in the other. The SoC was deliberately chosen to have a different number of processors in the clusters as a kind of worst case to catch any problems that might arise from the asymmetry. Normally, one would want the same number of processors in each cluster, he said.
The clusters are connected with a cache-coherent interconnect, which can snoop the cache to keep it coherent between clusters. There is an interrupt controller on the SoC that can route any interrupt from or to any CPU. In addition, there is support in the SoC for I/O coherency that can be used to keep GPUs or other external processors cache-coherent, but that isn’t needed for Linaro’s tests.
The idea behind big.LITTLE is to provide a balance between power consumption and performance. The first idea was to run CPU-hungry tasks on the A15s, and less hungry tasks on the A7s. Unfortunately, it is “hard to predict the future”, Poirier said, which made it difficult to make the right decisions because there is no way to know what tasks are CPU intensive ahead of time.
Two big.LITTLE approaches
That led Linaro to a two-pronged approach to solving the problem: Heterogeneous Multi-Processing (HMP) and the In-Kernel Switcher (IKS). The two projects are running in parallel and are both in the same kernel tree. Not only that, but you can enable either on the kernel command line or switch at run time via sysfs.
With HMP, all of the cores in the SoC can be used at the same time, but the scheduler needs to be aware of the capabilities of the different processors to make its decisions. It will lead to higher peak performance for some workloads, Poirier said. HMP is being developed in the open, and anyone can participate, which means it will take somewhat longer before it is ready, he said.
IKS is meant to provide a “solution for now”, he said, one that can be used to build products with. The basic idea is that one A7 and one A15 are coupled into a single virtual CPU. Each virtual CPU in the system will then have the same capabilities, thus isolating the core kernel from the asymmetry of big.LITTLE. That means much less code needs to change.
Only one of the two processors in a virtual CPU is active at any given time, so the decision on which of the two to use can be made at the CPU frequency (cpufreq) driver level. IKS was released to Linaro members in December 2012, and is “providing pretty good results”, Poirier said.
An alternate way to group the processors would be to put all the A15s together and all the A7s into another group. That turned out to be too coarse as it was “all or nothing” in terms of power and performance. There was also a longer synchronization period needed when switching between those groups. Instead, it made more sense to integrate “vertically”, pairing A7s with A15s.
For the test SoC, the “extra” A7 was powered off, leaving two virtual CPUs to use. The processors are numbered (A15_0, A15_1, A7_0, A7_1) and then paired up (i.e. {A15_0, A7_0}) into virtual CPUs; “it’s not rocket science”, Poirier said. One processor in each group is turned off, but only the cpufreq driver and the switching logic need to know that there are more physical processors than virtual processors.
The virtual CPU presents a list of operating frequencies that encompass the range of frequencies that both A7 and A15 can operate at. While the numbers look like frequencies (ranging from 175MHz to 1200MHz in the example he gave), they don’t really need to be as they are essentially just indexes into a table in the cpufreq driver. The driver maps those values to a real operating point for one of the two processors.
Switching CPUs
The cpufreq core is not aware of the big.LITTLE architecture, so the driver does a good bit of work, Poirier said, but the code for making the switching decision is simple. If the requested frequency can’t be supported by the current processor, switch to the other. That part is eight lines of code, he said.
For example, if virtual CPU 0 is running on the A7 at 200MHz and a request comes in to go to 1.2GHz, the driver recognizes that the A7 cannot support that. In that case, it decides to power down the A7 (which is called the outbound processor) and power up the A15 (inbound). There is a synchronization process that happens as part of the transition so that the inbound processor can use the existing cache. That process is described in Poirier’s slides [PDF], starting at slide 17.
The outbound processor powers up the inbound and continues executing normal kernel/user-space code until it receives the “inbound alive” signal. After sending that signal, the inbound processor initializes both the cluster and interconnect if it is the first in its cluster (i.e. the other processor of the same type, in the other virtual CPU is powered down). It then waits for a signal from the outbound processor.
Once the outbound processor receives “inbound alive” signal, the blackout period (i.e. time when no kernel or user code is running on the virtual CPU) begins. The outbound processor disables interrupts, migrates the interrupt signals to the inbound processor, then saves the current CPU context. Once that’s done, it signals the inbound processor, which restores the context, enables interrupts, and continues executing from where the outbound processor left off. All of that is possible because the instruction sets of the two processors are identical.
As part of its cleanup, the outbound processor creates a new stack for itself so that it won’t interfere with the inbound. It then flushes the local cache and checks to see if it is the last one standing in its cluster; if so, it flushes the cluster cache and disables the cache-coherent interconnect. It then powers itself off.
There are some pieces missing from the picture that he painted, Poirier said, including “vlocks” and other mutual exclusion mechanisms to handle simultaneous desired cluster power states. Also missing was discussion of the “early poke” mechanism as well as code needed to track the CPU and cluster states.
Performance
One of Linaro’s main targets is Android, so it used the interactive power governor for its testing. Any governor will work, he said, but will need to be tweaked. A second threshold (hispeed_freq2) was added to the interactive governor to delay going into “overdrive” on the A15 too quickly as those are “very power hungry” states.
For testing, BBench was used. It gives a performance score based on how fast web pages are loaded. That was run with audio playing in the background. The goal was to get 90% of the performance of two A15s, while using 60% of the power, which was achieved. Different governor parameters gave 95% performance with 65% of the power consumption.
It is important to note that tuning is definitely required—without it you can do worse than the performance of two A7s. “If you don’t tune, all efforts are wasted”, Poirier said. The interactive governor has 15-20 variables, but Linaro mainly concentrated on hispeed_load and hispeed_freq (and the corresponding*2 parameters added for handling overdrive). The basic configuration had the virtual CPU run on the A7 until the load reached 85%, when it would switch to the first six (i.e. non-overdrive) frequencies on the A15. After 95% load, it would use the two overdrive frequencies.
The upstreaming process has started, with the cluster power management code getting “positive remarks” on the ARM Linux mailing list. The goal is to upstream the code entirely, though some parts of it are only available to Linaro members at the moment. The missing source will be made public once a member ships a product using IKS. But, IKS is “just a stepping stone”, Poirier said, and “HMP will blow this out of the water”. It may take a while before HMP is ready, though, so IKS will be available in the meantime.

Exynos Octa and why you need to stop the drama about the 8 cores [XDA Developers, March 15, 2013]

I’m going to write this as an guide/information page so we stop as soon as possible the stupid discussions about how 8 cores are useless.
What’s it all about?
The Exynos Octa or Exynos 5410 is a big.LITTLE design engineered by ARM and is the first consumer implementation of this technology. Samsung was their lead partner in terms of bringing this to market first. Reneseas is the other current chip designer who has publicly announced a big.LITTLE design.
    • Misconception #1: Samsung didn’t design this, ARM did. This is not some stupid marketing gimmick.

      The point of the design is to meld the advantages of the A7 processor architectures, with its extreme power efficiency, with the A15 architecture, with extreme performance at a cost of power consumption. The A7 cores are slightly slower than an A9 equivalent, but using much less power. The A15 cores are in another ballpark in terms of performance but their power consumption is also extreme on this current manufacturing generation.
      The effective goal is to achieve the best of both worlds. Qualcomm on the other does this by using their own architecture which is similar in some design aspects to the A15 architecture, but compromises on feature and performance to achieve higher power efficiency. The end result is for the user can be expressed in 2 measurements: IPC (Instrucitons per clock), and Perf/W (Performance per Watt).
      In terms of IPC, the A15 leads the pack by quite a margin, followed by Krait 400, Krait 300, Krait 200, A9, A7, and A8 cores, in that order.
      In terms of Perf/W, the A7 leads by a margin, followed by A9’s and the Krait cores, with the A15 at a distant last in terms of efficiency.
      Real-world use
      Of course, the Exynos Octa is the first to use this:

      image

      Currently, the official word seems to be that the A7 cluster is configured to run from 200 to 1200MHz, and the A15 cluster from 200 to 1600MHz.
      There are several use-cases of how the design can be used, and it is purely limited by software, as the hardware configuration is completely flexible.
      In-Kernel Switcher (IKS)
      This is what most of us will see this in our consumer products this year; Effectively, you only have a virtual quad-core processor. The A15 cores are paired up with the A7 core clusters. Each A15 has a corresponding A7 “partner”. Hardware wise, this pair-up has no physical representation as provided by an actual die-shot of the Exynos Octa.
      The IKS does the same thing as a CPU governor. But instead of switching CPU frequency depending on the load, it will switch between CPUs.

      image    image

      Effecively, you are jumping from one performance/power curve to another: And that’s it. Nothing more, nothing less.
      The actual implementation is a very simple driver on the side of the kernel which measures load and acts much like a CPU governor.
      [PhoneArena YouTube channel, Feb 25, 2013] For more details, check out our web site:http://www.phonearena.com/ PhoneArena presents a video demonstration of the new Samsung Exynos 5 Octa chipset – the manufacturer’s first octa-core processor! As you can imagine, the Exynos 5 Octa is very new and not available in any handset yet, but we expect it to make an appearance in the Galaxy S IV! So, it’s definitely worth checking!
      The above is a demonstration; you can see how at most times the A7 cores are used for video playback, simple tasks, and miscellaneous computations. The A15 cores will kick in when there is more demanding load being processed, and then quickly drop out again to the A7 cores when it’s not doing much anymore.
      • Misconception #2: You DON’T need to have all 8 cores online, actually, only maximum 4 cores will ever be online at the same time.
      • Misconception #3: If the workload is thread-light, just as we did hot-plugging on previous CPUs, big.LITTLE pairs will simply remain offline under such light loads. There is no wasted power with power-gating.
      • Misconception #4: As mentioned, each pair can switch independently of other pairs. It’s not he whole cluster who switches between A15 and A7 cores. You can have only a single A15 online, together with two A7’s, while the fourth pair is completely offline.
      • Misconception #5: The two clusters have their own frequency planes. This means A15 cores all run on one frequency while the A7 cores can be running on another. However, inside of the frequency planes, all cores run at the same frequency, meaning there is only one frequency for all cores of a type at a time.
      Heterogeneous Multi-Processing (HMP)
      This is the other actual implemented function mode of a big.LITTLE CPU. In this case, all 8 cores can be used simultaneously by the system.
      This is a vastly more complex working mechanism, and its implementation is also an order of magnitude more sophisticated. It requires the kernel scheduler to actually be aware of the differentiation of between the A7 and A15 cores. Currently, the Linux kernel is not capable of doing this and treats all CPUs as equals. This is a problem since we do not want to use the A15 cores when a task can simply me processed on an A7 core with a much lower power cost.
      The Linaro working-group already finished the first implementation of the HMP design as a series of patches to be applied against the Linux 3.8 kernel. What they did is to make the scheduler smart enough to be able to track the load of single process entities, and with that information to schedule the threads smartly on either the A7 cores or the A15 cores. This achieves much lower latency in terms of switching workloads, or better said, switching the environments (CPUs) to the respective work-loads, and exposes the full processing capabilities of the silicon as all cores can be used at once.
      You can follow the advancements of this in the publications of the Linaro Connect summits that happen every few months. The code was only published in the middle of February this year for the first working implementation equivalent in power consumption to the IKS.
      • Misconception #6: Yes the CPU is a true 8-core processor. It’s just not being used as such in its initial software implementations

      big.LITTLE In Kernel Switcher [IKS] by Nicolas Pitre and Viresh Kumar [Charbax YouTube channel, March 16, 2013]

      Nicolas Pitre and Viresh Kumar are part of the core team from Linaro that is working on developing future solutions for the latest ARM architecture: big LITTLE. Here they discuss some of the internals of the famous IKS solution. They are joined by Naresh Kamboju who is part of QA team working for Linaro. This team including few more got “Outstanding team for 2012 for their work on IKS”. Filmed at Linaro Connect 2013 [March 4-8] in Hong Kong.

      Vincent Guittot on the Linaro big.LITTLE MP work [Charbax YouTube channel, April 1, 2013]

      Vincent Guittot, Linaro assignee from ST-Ericsson, talks about the work that is being done at Linaro to Extend the Linux kernel to support ARM’s big.LITTLE MP architecture, building on the features provided by the big.LITTLE Switcher project. The most powerful use model of big.LITTLE is called MP and enables the use of all physical cores at the same time. Threads with high priority and/or computationally intensive can in this case be allocated to the A15 cores while threads with less priority or less computationally intensive such as background tasks can be performed by the A7 cores. Filmed at Linaro Connect 2013 [March 4-8] in Hong Kong.

      The future of the semiconductor IP ecosystem

      December 13 Report:
      – Intel’s next-gen SoC manufacturing process will be able to deliver the next Bay Trail Atom only for 2014 products (with higher end Haswell for H2 2013), and it is just a 26nm process in terminology used by the foundry industry not a 22nm one touted by Intel

      Lesson from that: Intel may speak about its “22 nm SoC process” but given the late entry of its 32nm SoC process Atom product (Cover Trail) it would be better to assume that with Windows 8 tablets based on that it will affect only the 2014 tablet market, not earlier. This is what the latest leaks are suggesting as well. Meanwhile expect a low-power Haswell ULT based tablet PC push in the H2 2013 as described already in my Intel Haswell: “Mobile computing is not limited to tiny, low-performing devices” [Nov 15 – Dec 11, 2012] post. As for the next year the real question is Can VIA Technologies save the mobile computing future of the x86 (x64) legacy platform? [this same blog of mine, Nov 23, 2012] For this watch what Allwinner vis-à-vis HTC on 2013 International CES [this same blog of mine, Dec 11, 2012] could bring in that respect, something much more than what is described in Allwinner A31 SoC is here with products and the A20 SoC is coming [USD 99 Allwinner blog of mine, Dec 10, 2012] or in $99 Android 4.0.3 7” IPS tablet with an Allwinner SoC capable of 2160p Quad HD and built-in HDMI–another inflection point, from China again [this same blog, Dec 3, 2012].    

      – end of life of planar transistor and need to move to FinFET, but meanwhile FD-SOI to the rescue
      – ARM Physical IP division via its upcoming IP is preparing with its foundry partners (TSMC, GLOBALFOUNDRIES and Samsung) an easier transition to FinFET

      September 27 report:
      – TSMC’s View of the Semiconductor IP Ecosystem
      – Overall semiconductor IP market overview
      – The CEVA case
      – When sticking with the “Goliath”: ARM Holdings Plc
      – When sticking with a “David”: CAST Inc.

      Note: I am not discussing at all the most important development of the 64-bit ARM introductions as will devote to it a separate composite trend-tracking post on this blog.

      Warning: These two reports are rather comprehensive and extensive on the given subject. When you will read these through your reward will be a deep and wide ranging understanding of this most actual issue for understanding the upcoming very dramatic changes in the further development of the whole ICT industry. To illustrate only some of the most related topics here is a copy of tags for this post:
      14 nm, 14nm, 20 nm, 20nm, 22 nm, 22nm, 28 nm, 28nm, 3D devices, Allwinner, AndesCore, ARM Artisan IP, ARM Holdings, ARM Physical IP division, Artisan Physical IP Platform,Atom, BA22-AP, Bay Trail, Beyond BA22, big.LITTLE Processing, bulk CMOS, CAST Inc., CAST IP, CEVA, choice IP partner, Cortex A15, Cortex-A7, EnSilica eSi-3250, Fastec Imaging Corporation, Fastec TS3, FD-SOI, finFET,foundries, foundry and IP business model, foundry business, Freescale, Freescale ColdFire, general-purpose foundry business, GlobalFoundries, Haswell, Haswell-ULT, in-house IP blocks, inflection points, Intel, Intellectual Property, interface products, Internet of Things, IOT, IP suppliers, Kinetis, LEON3, licensable IP blocks, Lincroft, logic products, mainstream CMOS, Mali, MarketsandMarkets, MediaTek, memory compilers, MIPS32, mobile computing,Motomic, MT6588, MT6589, OpenRISC, planar transistor, POP, prime IP partners, Processor Optimization Pack,reusable subsystems, Samsung, semiconductor design, semiconductor intellectual property market, semiconductor IP, semiconductor IP ecosystem, semiconductor IP market, semiconductor IP revenue, silicon IP market, SoC manufacturing process, SoC process, Sodaville, SOI, standard cells, standard industry IP blocks, STMicroelectronics,system IP, tablet PC, transistor designs, Tri-Gate, Tri-Gate transistor, TSMC, TSMC IP Alliance, TSMC IP portfolio,TSMC Soft-IP Alliance, UMC, VIA Technologies, Z670


      December 13 Report

      – Intel’s next-gen SoC manufacturing process will be able to deliver the next Atom only for 2014 products (with higher end Haswell for H2 2013), and it is just a 26nm process in terminology used by the foundry industry not a 22nm one touted by Intel

      Intel progressing in development of 14nm technology, says CTO [DIGITIMES, Dec 5, 2011]

      Intel CTO Justin Rattner on December 4 said that Intel’s development of 14nm technology is on schedule with volume production to kick off in one to two years and development of 18-inch wafers is under way through cooperation with partners.

      Rattner also noted that Intel’s aggressiveness over technology advancement will allow Moore’s Law to extend for another 10 years.

      At the end of 2013, Intel will enter the generation of 14nm CPUs (P1272) and SoCs (1273), while expanding its investments at its D1X Fab in Oregon, and Fab 42 in Arizona, the US and Fab 24 in Ireland, and will gradually enter 10nm, 7nm and 5nm process generations starting 2015.

      As for Intel’s competitors, Samsung is already set to enter 20nm in 2013 and is already working on its 14nm node, while Taiwan Semiconductor Manufacturing Company’s (TSMC) 20nm process [planar, i.e bulk CMOS, see below] will enter small volume production in the second half of 2013 with the first 3D-based FPGA chips to also start.

      Globalfoundries has previously announced its 14nm FinFET process will start pilot production at the end of 2013 and enter mass production in 2014.

      As for 18-inch wafers, Intel has invested in Holland-based ASML for its EUV technology, and related technologies are expected to start entering production in 2017.

      Intel Has No Process Advantage In Mobile, says ARM CEO [Mannerisms on Electronics Weekly, Oct 24, 2012]

      Intel has no advantage in IC manufacturing when it comes to manufacturing processes used for mobile ICs, Warren East, CEO of ARM, tells EW.

      “This time last year there was a lot of noise from the Intel camp about their manufacturing superiority,” says East, “we’re sceptical about this because, while the ARM ecosystem was shipping on 28nm, Intel was shipping on 32nm. So I don’t see where they’re ahead.”

      Furthermore, with the foundries accelerating their process development timescales, it looks increasingly unlikely that Intel will be able to find any advantage on mobile process technology in the future.

      “We’re supporting all the independent foundries,” says East. That includes 20nm planar bulk CMOS and 16nm finfet at TSMC; 20nm planar bulk CMOS and 14nm finfet at Samsung and 20nm planar bulk CMOS, 20nm FD-SOI and 14nm finfet at Globalfoundries.

      It gives the ARM ecosystem a formidable array of processes to choose from. “I’m no better equipped to judge which of these processes will be more successful than anyone else,” says East, “our approach is to be process agnostic.”

      The important thing is that the foundries’ process roadmap is on track to intersect Intel’s at 14nm.

      14nm will be the first process at which Intel intends to put mobile SOCs to the front of the node i.e. putting them among the first ICs to be made on a new process.

      Asked if the foundries were prepping their next generation processes with the intention of putting mobile SOC at the front of the node, East replies: That’s the information we’re seeing from our foundry partners.”

      Globalfoundries intends to have 14nm finfet in volume manufacturing in 2014, the same timescale as Intel has for introducing 14nm finfet manufacturing.

      In fact, GF’s 14nm process may have smaller features than Intel’s 14nm process because, says Mojy Chian senior vp at Globalfoundries, because “Intel’s terminology doesn’t typically correlate with the terminology used by the foundry industry. For instance Intel’s 22nm in terms of the back-end metallisation is similar to the foundry industry’s 28nm. The design rules and pitch for Intel’s 22nm are very similar to those for foundries’ 28nm processes.”

      Jean-Marc Chery, CTO of STMicroelectronics points out that the drawn gate length on Intel’s ˜22nm” process is actually 26nm.

      Furthermore Intel’s triangular fins, which degrade the advantages of finfet processing could underperform GF’s rectangular fins which optimise the finfet advantage.

      At the front of the GF 14nm finfet node will be mobile SOCs says Chian. GF has been working with ARM since 2009 to optimise its processes for ARM-based SOCs.

      At TSMC the first tape-out on its 16nm finfet process is expected at the end of next year. That test chip will be based on ARM’s 64-bit V8 processor.

      Using an ARM processor to validate its 16-nm finfet process should give TSMC’s ARM-based SOC customers great confidence.

      Asked about the effects of finfets on ARM-based SOCs, East replies: “There’s no rocket science in what you get out of it. The question is does it deliver the benefits at an acceptable cost? You don’t get something for nothing. How much does it cost to manufacture? How good is the yield? And that, of course, affects cost.”

      And so on goes Intel beating its head against the wall to get into the low-margin mobile business.

      Recently Intel  said it expected its Q4 gross margin to drop 6% from Q3’s 63% to 57%. Shock, horror said the analysts

      But if Intel succeeds in the mobile business, its gross margin will drop a lot more than that.

      It’s a funny old world.

      The Truth About Intel [Mannerisms on Electronics Weekly, Dec 5, 2012]

      The darndest things are being said about Intel. The departure of its CEO is unexplained though I heard one person say it was voluntary.

      Some people think Apple will put x86 in the iPad.

      Others think Apple will drop x86 from iMacs so as to unify its processors across Phone, Pad and Mac.

      Sure as eggs are eggs, both can’t happen

      Some think Intel is going to become a foundry in a major way starting with Apple’s business – though it’s said the production cost of an Intel wafer is 3x that of a TSMC wafer.

      Others say Intel may make wafers for a few customers but will not enter an industry servicing thousands of customers with hundreds of thousands of mask-sets.

      Intel is to borrow $6 billion to buy its own shares something it has been doing for some time. I am too financially unsophisticated to understand why it does this but, even before this latest borrowing, Intel’s debt was already pretty high at over $7 billion and its cash rather low – for a cash generative, capex-gobbling company – at $10.5 billion.

      The divi is generous – but the purpose of the generosity is to keep the share price up, then generosity hasn’t worked – Intel’s share price is under $20, unchanged in a decade.

      The strategy of getting x86 into mobile phones seems mistimed when Apple and Samsung and now LG are designing their own mobile phone processors. This morning Samsung said it will start mass-roducing its own-brand 28nm processors for mobile devices early in 2013.

      Intel’s fab situation at 22nm looks tough with 50% utilisation. A $500 million charge for this is expected to be taken in Q4.

      Intel’s claim to have a manufacturing advantage looks unconvincing when its 22nm process turns out to have a drawn gate length of 26nm – virtually the same as volume processes at  leading foundries.

      Where it matters, i.e. in the mobile market, Intel has no process advantage at all because Intel hasn’t yet put its mobile SOCs on its latest process at the start of a node. Intel’s mobile SOCs won’t enjoy early access to a new process node until the 14nm generation.

      And was finfet the right bet?  20nm planar may still be made to work, while FD-SOI could turn out to be a better route than finfet

      Meanwhile CEO Paul Otellini won the 2012 Open-Mouth-Insert-Foot Award by some spectacular boo-boos:

      • Saying Windows 8 wasn’t ready just before its launch, provoked Microsoft’s riposte that Intel’s power management software wasn’t ready for the launch of Surface, Microsoft’s Windows 8 tablet.
      • And endorsing Governor Mitt Romney in the recent US presidential elections probably irked the White House just as Otellini was earning some brownie points by sitting on a Presidential committee. They were much needed brownie points after Intel’s pasting from the FTC for ‘stifling innovation.’

      And all the while and worst of all, the PC industry starts to contract and Intel has won few slots in the successor to the PC industry – the mobile device industry.

      All in all a pretty rotten year for Intel despite taking in over $50 million in revenues and earning over $12 billion in profits.

      Even silver linings can have clouds.

      So the war is on as per: IBM, Intel face off at 22 nm [EE Times, Dec 10, 2012]

      SAN FRANCISCO – Intel and IBM went head-to-head with their latest 22-nm technologies in back-to-back papers at the International Electron Devices Meeting (IEDM) here Monday (Dec. 10). Separately, a top Intel fab executive commented on increasing wafer costs and the company’s foundry business.

      IBM said it is prototyping server processors in a new 3-D ready, 22-nm process technology it hopes will deliver 25 to 35 percent boosts over its 32-nm node. Intel retains an edge with several 22-nm chips already in volume production, and disclosure at IEDM of a variant of the process for SoCs for a wide range of applications.

      The Intel paper showed support for “high drive current across the spectrum of leakage and a full suite of SoC tools,” Mark Bohr, head of Intel’s process technology development group, said in a brief interview. The process is geared for a much wider array of designs than that of IBM, he added.

      Bohr said Intel’s 22-nm FinFET process is cost effective, contradicting report it is 30 to 40 percent more expensive than TSMC’s 28-nm planar process. The addition of FinFET adds only 3 percent to the cost of the process. Its use of 80-nm minimum feature sizes can be made with a single pass of 193-nm lithography tools, making it cost effective.

      Projections from an IMEC keynote that 14-nm wafers will be 90 percent more expensive than 28-nm parts due to the lack of EUV lithography are inaccurate, Bohr asserted. The cost increase for 14-nm wafers at Intel “is nowhere near that,” he said.

      “Cost per wafer has always gone up marginally each generation, somewhat more so in recent generations, but that’s more than offset by increases in transistor density so that the cost per transistor continues to go down at 14 nm,” Bohr said.

      Separately, Bohr said Intel does have a growing foundry business that may include some higher volume applications than its current announced customers like FPGA startup Achronix. However, “we don’t intend to be in the general-purpose foundry business…[and] I don’t think the [foundry] volumes ever will be huge” for Intel, he said.

      Intel’s paper laid out characteristics of Intel’s 22-nm process variation for SoCs (see chart below). It outperforms Intel’s 32-nm planar process by 20 to 65 percent and covers four orders of magnitude in leakage current, said co-author C.H. Jan.

      image

      The process provides 51 to 56 percent improvements in high voltage performance used for fast interfaces such Ethernet, HDMI and PCI Express. That’s more than twice the 20 percent boost typical in this area for a new Intel node, Jan said.

      In addition, analog performance went up three-fold after declines in the past three nodes. Intel offers a small library of analog circuits tailored to the process including precision resistors, metal-in-metal capacitors and high Q inductors.

      The process supports high and standard performance options as well as low and ultra low power ones. It also includes SRAM designs optimized for density, power and performance some of which now hit 2.6 GHz at 1V, up from 1.8 GHz at 32 nm.

      Finally, Intel created two new transistor designs specifically for the 22-nm SoC variant. One is focused on low power and the other on high voltage for mixed-signal and analog circuits (see chart above).

      image

      For its part, IBM described its 22-nm process using partially depleted silicon-on-insulator. IBM “has prototyped a number of server processors” in the node that achieve latency below 1.5 ns and 750 MHz random clock cycles, said IBM researcher S. Narasimha.
      Narasimha declined to give specifics of what IBM might achieve with the 22-nm node. However he did say the goal was to provide 25 to 35 percent boosts of the previous node which delivered server processors running up to 5.5 GHz and others with up to 80 Mbytes embedded DRAM.
      IBM created an SRAM cell that measures 0.026 mm2 using the process. It also power supplies at 1.2V across a 550 mm2 die area, he said.
      The process provides up to 15 levels of metal. The lowest five levels use 80-nm features, similar to the Intel process, and the top two levels support through-silicon vias for 3-D stacks with memory chips.
      IBM will deliver a separate paper Wednesday on its 3-D stacking work.

      Before that it was that Intel describes 22-nm SoC process, not chips [EE Times, Sept 13, 2012] 

      Intel provided the first look at the system-on-chip variant of its 22-nm process technology in a talk at the Intel Developer Forum here Thursday (Sept. 13). However, it declined to provide details on the Atom-based SoCs for tablets and smartphones that will be made in that process.

      “It’s fair to say Intel didn’t have much of a focus four or five years ago on SoCs, but that’s changed,” said Mark Bohr, director of Intel’s technology and manufacturing group in a process technology talk. “The success of Medfield [Intel’s 32-nm smartphone platform] shows we are learning to do it right, and I think we will have a technology advantage at 22 nm,” he said.

      Intel showed at IDF six smartphones and four Windows 8 tablets using the Medfield SoC, made in an SoC variant of its 32-nm process. “There’s a lot more in the pipeline,” said Ticky Thakkar, a lead Atom designer in a separate talk on the mobile chips.

      The company is already shipping to OEMs a 2-GHz version of Clover Trail, a follow on 32nm dual-core processor with boosted graphics. A 1.8-GHz version for tablets is also in the works.

      Next up is Bay Trail, Intel’s first 22-nm SoC for tablets and smartphones, expected to debut at IDF Beijing [April 10-11, 2013 as per the IDF page of Intel]. “You’ll have to wait until next year to hear about it,” said Thakkar.

      In a separate talk, Bohr described P1271, the 22-nm SoC process to be used for Bay Trail. It differs from the 22-nm CPU process now used for Intel’s Ivy Bridge processors by offering lower leakage logic transistors, higher voltage I/O transistors, denser upper layer interconnects and a set of precision resistors, capacitors and inductors.

      image

      “It’s not one set of features, but a menu of feature options—transistors, I/O, interconnects, passive elements and embedded memory,” Bohr said. “The [SoC] transistors go down to much lower leakage levels, but give up some performance,” he said.

      image

      The process has significantly better analog characteristics than Intel’s current 32-nm planar process. Designs make heavy use of 80-nm pitch features in lower metal layers, because they are the smallest features Intel can make at 22 nm without needing double patterning, he added.

      Intel is running the process at three fabs, two in the U.S. and one in Israel. It will ramp soon in two other fabs.

      Reminders: Silicon Technology for 32 nm and Beyond System-on-Chip Products [IDF 2009 presentation by Mark Bohr, Sept 23, 2009]

      image     image

      image   image

      image

      Products (Formerly Lincroft) [Intel page]
      – Number of Products: 5
      – Launch Range: Q2’11 – Q2’10
      – Max TDP: 1.3W (Z600) – 3W (Z670)
      Z600 (512K Cache, 1.20 GHz)
      Z670 (512K Cache, 1.50 GHz)

      while the first SoC product was the Sodaville which had no real market success (even specs are not listed on the ark.intel.com), and as such was not continued:
      Intel Unveils 45nm System-on-Chip for Internet TV  [press release, Sept 24, 2009]

      Intel Corporation today unveiled the Intel® Atom™ processor CE4100, the newest System-on-Chip (SoC) in a family of media processors designed to bring Internet content and services to digital TVs, DVD players and advanced set-top boxes.

      The CE4100 processor, formerly codenamed “Sodaville,” is the first 45nm-manufactured consumer electronics (CE) SoC based on Intel architecture. It supports Internet and broadcast applications on one chip, and has the processing power and audio/video components necessary to run rich media applications such as 3-D graphics.

      Intel® Atom™ Processor CE4100
      The CE4100 processor can deliver speeds up to 1.2GHz while offering lower power and a small footprint to help decrease system costs. It is backward compatible with the Intel® Media Processor CE 3100 and features Intel® Precision View Technology, a display processing engine to support high-definition picture quality and Intel® Media Play Technology for seamless audio and video. It also supports hardware decode of up to two 1080p video streams and advanced 3-D graphics and audio standards. To provide OEMs flexibility in their product offerings, new features were added such as hardware decode for MPEG4 video that is ready for DivX* Home Theater 3.0 certification, an integrated NAND flash controller, support for both DDR2 and DDR3 memory and 512K L2 cache. The CE SoC contains a display processor, graphics processor, video display controller, transport processor, a dedicated security processor and general I/O including SATA-300 and USB 2.0.

      Lincroft is mentioned in my Windows 7 tablets/slates with Oak Trail Atom SoC in December [Nov 1-24, 2010] post as:

      Intel “is aiming to mass produce its Oak Trail platform for its Sleek Netbook segment targeting the tablet PC market in December 2010. The Oak Trail platform is a combination of Intel’s Lincroft (Atom Z6xx series) processor with Whitney Point chipset.”

      The Oak Trail platform will sell at about US$25 with MeeGo [which was terminated as Nokia exited that joint effort 3 months later], and the price for Oak Trail and Microsoft’s Windows 7 will be higher.

      so it was Intel’s first attempt to compete against the ARM-based tablet business, including the already successful iPad. As such it ended nowhere in terms of volumes. So adjustment followed as early as noted in my Intel: accelerated Atom SoC roadmap down to 22nm in 2 years and a “new netbook experience” for tablet/mobile PC market [April 17, 2012] despite that fact that products based on Z670 Atom from Lenovo and Fujitsu, as the big names, and Evolve, Motion Computing, Razer and Viliv, as much lesser names, appeared on the market from April, 2011 on (you could find information about them in the post itself). The price was too high: e.g. $729 for the Evolve III Maestro C.

      The next Atom based on Intel’s 32nm SoC process appeared in fact just recently, first appeared in Acer Iconia W510: Windows 8 Clover Trail (Intel Z2760) hybrid tablets from OEMs [Oct 28, 2012] priced little lower, from $499 and up which is still overpriced relative to the ongoing 10” Android tablets. Moreover, it became available on in the second half of November and appeared on the Microsoft store to celebrate Cyber Monday (Nov 26) discounted to $399, which is the only competitive price. Now it is back to $499.

      Lesson: Intel may speak about its “22 nm SoC process” but given the late entry of its 32nm SoC process Atom product (Cover Trail) it would be better to assume that with Windows 8 tablets based on that it will affect only the 2014 tablet market, not earlier. This is what the latest leaks are suggesting as well. Meanwhile expect a low-power Haswell ULT based tablet PC push in the H2 2013 as described already in my Intel Haswell: “Mobile computing is not limited to tiny, low-performing devices” [Nov 15 – Dec 11, 2012] post. As for the next year the real question is Can VIA Technologies save the mobile computing future of the x86 (x64) legacy platform? [this same blog of mine, Nov 23, 2012] For this watch what Allwinner vis-à-vis HTC on 2013 International CES [this same blog of mine, Dec 11, 2012] could bring in that respect, something much more than what is described in Allwinner A31 SoC is here with products and the A20 SoC is coming [USD 99 Allwinner blog of mine, Dec 10, 2012] or in $99 Android 4.0.3 7” IPS tablet with an Allwinner SoC capable of 2160p Quad HD and built-in HDMI–another inflection point, from China again [this same blog, Dec 3, 2012].

      End of Reminders


      – end of life of planar transistor and need to move to FinFET, but meanwhile FD-SOI to the rescue

      FinFETs or FD-SOI? [SemiMD (Semiconductor Manufacturing and Design), Dec 11, 2012]

      By Ed Sperling
      STMicroelectronics yesterday unveiled the results of its 28nm production silicon chips using fully depleted silicon on insulator technology, which it claims offers a 30% improvement in speed over bulk CMOS while using less power.

      The debate over FD-SOI and FinFETs has been notching up over the past few months. While FinFETs and FD-SOI both promise improvements in controlling leakage current, the FinFETs are more difficult to design. FD-SOI uses the same design flow, although it does use a different SPICE model with better characteristics than the one used for bulk CMOS.

      ST also used an ultra thin body and box (UTBB) and body biasing to boost performance, according to Joel Hartmann, the company’s executive vice president of front-end manufacturing and process R&D. Hartmann presented his results at an SOI Consortium-sponsored event at the IEDM show last night.

      “We are using body bias to boost performance,” Hartmann said. “You can do that with FD-SOI. We also decreased the Vdd of the device by applying body biasing.”

      What’s particularly attractive about FD-SOI is that is can be implemented at the 28nm node for a boost in performance and a reduction in power. The mainstream process node right now is 40nm. And while Intel introduced its version of a finFET transistor called Tri-Gate at 22nm, TSMC and GlobalFoundries plan to introduce it at the next node—whether that’s 16nm or 14nm. That leaves companies facing a big decision about whether to move all the way to 16/14nm to reap the lower leakage of finFETs, whether to move to 20nm on bulk, or whether to stay longer at 28nm with FD-SOI.

      Hartmann said ST has seen improvements in analog running on FD-SOI, and for memory where the minimum voltage required is lower. He said ST’s road map calls for FD-SOI all the way down to 10nm, with voltages dropping from 0.9v at 28nm to 0.8v at 14nm and 0.7v at 10nm.

      One of the sticking points in adopting FD-SOI has been market acceptance. Despite the promise of improved performance and/or lower power, bulk CMOS has been extended using a variety of techniques such as strain engineering and FD-SOI is considered more expensive. At 28nm and beyond, however, bulk has run out of steam, which is why Intel has opted for finFETs.

      Still, FinFETs are more difficult to design and manufacture, and they potentially can add significantly to the cost of an SoC. FD-SOI, in contrast, uses the same design tools and reduces the number of masks and metal layers. ST is the first large fab-lite company to adopt FD-SOI and to move beyond just test chips. It remains to be seen which path the rest of the industry takes—and how quickly.

      Increasing Levels Of Risk [SperlingMediaGroup YouTube channel, Dec 11, 2012]

      Semiconductor Manufacturing & Design sits down with Mentor Graphics’ Jean-Marie Brunet to talk about double patterning, FinFETs, design rules at advanced nodes and why design for manufacturing (DFM) has suddenly become so popular.

      Inflection Points [SperlingMediaGroup YouTube channel, Aug 14, 2012]

      Semiconductor Manufacturing and Design talks with Paul Boudre, chief operating officer at Soitec, about FinFETs, industry inflection points, the end of life for planar transistors, bulk CMOS vs. SOI, the differences between fully depleted and partially depleted SOI, and the FD-SOI ecosystem.

      See also: ST’s FD-SOI Tech Available to All Through GF [SemiMD (Semiconductor Manufacturing and Design), Oct 8, 2012]


      – ARM Physical IP division via its upcoming IP is preparing with its foundry partners (TSMC, GLOBALFOUNDRIES and Samsung) an easier transition to FinFET

      2012 ARM TechCon John Heinlein Interview [chipestimate YouTube channel, Dec 4, 2012]

      Sean O’Kane, Producer/Host ChipEstimate.TV John Heinlein, VP Marketing, Physical IP Division at ARM

      TSMC OIP 2012 – Sit down with John Heinlein, ARM [chipestimate YouTube channel, Dec 4, 2012]

      Sean O’Kane, Producer/Host ChipEstimate.TV interviews at TSMC OIP 2012

      An introductory type video for the roundtable video which is the next:
      ARM 16/14nm FinFET Manufacturing Leadership [Charbax YouTube channel, Nov 1, 2012]

      John Heinlein, Vice President of Marketing, Physical IP Division at ARM talks about the 14nm FinFET ARM Processor manufacturing technology that is being developed and that is starting to be manufactured next year.

       

      ARM TechCon 2012 Executive Roundtable: Manufacturing [ARMflix YouTube channel, Nov 14, 2012]

      Embedded in the beginning of this roundtable video there is a [4:19] minutes long Investing in FinFET Technology Leadership Presented by ARM [ARMflix YouTube channel, Nov 12, 2012] video in which Dr. Rob Aitken, R&D Fellow at ARM, discusses the need for new transistor technologies and how FinFET may be a solution. The embedded video is starting at [00:39] of the roundtable video. From this I will transcribe here the following part showing ARM’s commitment and strategy for FinFET in its Physical IP Division:

      [02:30] ARM is taking a leadership position in FinFET IP development to accelerate the availability of FinFET IP in ARM partnership. We are working closely with foundry partners to develop prototype FinFET physical IP early in the process lifecycle. Using this prototype physical IP ARM is currently developing two different FinFET test chips both taping out in Q3 2012. These efforts continue ARM’s commitment to early development of silicon testing to reduce risk and time to market. Through our early engagement and prototyping work we actively provide feedback to our foundry partners to assure that FinFET technology is well suited to the requirements of energy efficient SoCs. ARM is further contributing to the technical community by publicly releasing fully pre-authorized FinFET transistor model based RTRs roadmap and is extending these models to more advanced FinFET designs. Internally we are modeling proprietary foundry technologies in support of the development work on those processes. This is just the beginning of ARM’s commitment to FinFET IP leadership. [03:46]

      This ARM TechCon panel included the following speakers: Moderator: Dr. John Heinlein VP, Marketing, Physical IP Division ARM Panelists: Simon Segars EVP and GM, Processor and Physical IP Divisions ARM Gregg Bartlett SVP & CTO GLOBALFOUNDRIES Dr. Jong-Shik Yoon SVP, Logic Technology Development, Semiconductor R&D Samsung Dr. Shang-yi Chiang EVP & Co-COO TSMC

      There are a number of other ARM specific information about its FinFET efforts in the September 27 report which is in the following major section. Now additional ones from its foundry partners:

      Breathing New Life into the Foundry-Fabless Business Model [ARM’s SoC Design blog, Aug 21, 2012]

      Early last week, GLOBALFOUNDRIES jointly announcedwith ARM another important milestone in our longstanding collaboration to deliver optimized SoC solutions for ARM® processor designs on GLOBALFOUNDRIES’ leading-edge process technology. We’re extending the agreement to include our 20nm planar offering, next-generation 3D FinFET transistor technology, and ARM’s Mali™ GPUs.
      Our collaboration with ARM goes back many years, and its evolution parallels some of the critical developments in the larger semiconductor industry during the same timeframe. ….

      This early and deep collaboration has resulted in several significant milestones, including the world’s first foundry optimized Cortex-A9 processor, POP™ IP for the Cortex-A9 processor operating at 1.6GHzon our 28nm-SLP technology, and a demonstration of more than 2GHzon our 28nm high-performance technology. This platform builds on the existing ARM Artisan® physical IP platforms for GLOBALFOUNDRIES processes at 65nm, 55nm and 28nm.

      Now we are extending this collaboration to include true joint optimization for 20nm technologies and beyond, as well as a new focus on GPUs, which are becoming increasingly important in today’s smart mobile devices. The TQV strategy has already been scaled to 20nm and is an integral part of our process development, with a 20nm test chip implementation currently running through our Fab 8 in Saratoga County, N.Y.

      And while we are seeing great dividends from this collaboration, the real hard work is only just beginning. We are now leveraging historical synergies from 28nm and 20nm planar technology to enable a smooth migration to next-generation, three-dimensional FinFET technology. One of the well publicized benefits of FinFET technology is its superior low-power attributes. The intrinsic capability of the 3D transistor to operate at a lower Vdd translates to longer battery life, which is heavily sought after in performance-hungry mobile computing applications. Our collaboration is focused tightly on this sweet spot in the market, where designers are looking for the optimum combination of performance, power-consumption, area, and cost. Our co-development work with ARM will enable a faster time to FinFET SoC solutions for customers using ARM’s next generation of mobile SoC IP for both CPUs and GPUs.

      So clearly the foundry-fabless business model is not collapsing, but rather adapting to meet the challenges of today. Success will be a result of much closer joint development at the technology definition level, early engagement at the architectural stage, and a more integrated and cooperative ecosystem – precisely the kind of collaboration that we’re demonstrating with our valued partner ARM.

      Guest Partner Blogger:

      Attached ImageMike Noonen is Executive Vice President, Worldwide Marketing and Sales, for GLOBALFOUNDRIES. In this role, he is responsible for global customer relationships as well as all marketing, sales, customer engineering and quality functions.

      GLOBALFOUNDRIES at ARM Techcon 2012 [Charbax YouTube channel, Oct 30, 2012]

      Talking about the fabrication of ARM Processors, from 28/32nm HKMG to 20nm to upcoming FinFET 14nm process technologies with Subramani Kengeri, Vice President, Technology Architecture, Office of the CTO, Paul Colestock, Director, Strategic Marketing and Srinivas Nori, Director, Marketing, SoC Innovation at GlobalFoundries at ARM Techcon 2012.

      If interested in the GLOBALFOUNDRIES Fireside Chat mentioned here watch the separate video GLOBALFOUNDRIES Fireside Chat at ARM Techcon 2012 [Charbax YouTube channel, Oct 31, 2012] with the following content:

      “The insatiable need for functional and feature integration on to Mobile SoCs, coupled with ever increasing performance demands has challenged the Foundries and Fabless Semiconductor companies alike. While the diminishing geometries of the process technologies have kept pace to address this challenge, the solutions for leakage power dissipation continued to fall behind threatening to thwart the advances in Mobility. The ground-breaking FinFET technology is the right low-power solution and will serve as an inflection point to further enable SoC-level integration and technological advances in this exciting era of Extreme Mobility. The panel will discuss how the next generation of FinFET technology will change the mobile revolution again.”

      Speakers

      Dean Freeman, Research VP, Gartner Research
      Bruce Kleinman, VP, Product Marketing, GLOBALFOUNDRIES
      Subramani Kengeri, Vice President, Technology Architecture Office of the CTO, GLOBALFOUNDRIES
      Srinivas Nori, Director. SOC Innovation, GLOBALFOUNDRIES
      Dipesh Patel, Deputy General Manager of the Physical IP Division, ARM

      TSMC’s information about collaboration with ARM in FinFET space was already included in the second major section (September 27 Report) beginning from ARM and TSMC Collaborate to Optimize Next-Generation 64-bit ARM Processors for FinFET Process Technology [ARM press release, July 23, 2012] part in the text. As an update to that I will include here:  TSMC Accelerates finFET Efforts [SemiMD (Semiconductor Manufacturing and Design), Oct 16, 2012]

      In response to its foundry rivals, Taiwan Semiconductor Manufacturing Co. Ltd. (TSMC) has updated and accelerated its process roadmap. The world’s largest silicon foundry has accelerated its 16nm finFET efforts by one quarter and added a 10nm finFET technology to the roadmap.

      TSMC also plans to take the “modular fin” approach for its 16nm finFET. It is also looking at 450mm fabs at the 10nm node, according to a TSMC executive, who also stressed that collaboration is a key to success. Customers must collaborate earlier in the design cycle and “at a new level,” said Mark Liu, executive vice president and co-chief operating officer at TSMC, during a keynote at the company’s Open Innovation Platform Ecosystem Forum in San Jose, Calif. on Tuesday (Oct. 16). “We need to align strategically.”

      At present, TSMC is ramping up its 28nm process technology. The next process on the roadmap, dubbed CLN20, is a 20nm planar technology. The reference flow for CLN20 is ready and the process is due out in 2013.

      [See: TSMC 20nm and CoWoS™ Design Infrastructure Ready [TSMC press release, Oct 9, 2012]

      Then, as previously announced, TSMC will enter the finFET transistor era. The company’s initial finFET process, dubbed CLN16FF, is being targeted and branded for the 16nm node. TSMC’s 16nm finFET process is slated for risk production in November of 2013, Liu said. Risk production has been accelerated from February of 2014 to November of 2013.

      In an interview after the keynote, Liu said TSMC will take a “modular fin” approach in finFETs. TSMC will marry a 16nm fin with a 20nm backend. “It has 20nm design rules,” he said.

      TSMC will also implement a triple-patterning strategy for 16nm finFETs. The company is also keeping its options open. It is exploring 193nm immersion extensions, extreme ultraviolet (EUV) lithography and multi-beam. “At this point, we have both (193nm extensions and EUV) under development,” he said. “Maybe multi-beam will save the day.”

      TSMC’s 16nm finFET design solutions, including the EDA tools and IP, will be ready by the first quarter of 2013.  “We have pulled in our design enablement solutions,” said Cliff Hou, senior vice president of TSMC, during a separate keynote at the event. The first version of the design solutions, dubbed V0.1, is slated for introduction in January. The second version, V1.0, is due out in October of 2013.

      Meanwhile, during his keynote, Liu presented a slide that denoted CLN10FF, which is a second-generation finFET for the 10nm node.  TSMC’s 10nm finFET process is expected to move into risk productionclose to the end of 2015,” he said.

      Also at 10nm, TSMC is looking to enter the 450mm fab era. It is likely TSMC will have a 450mm fab or pilot line in the second phase of 10nm. “There are no show stoppers,” he said. “All of the equipment companies are developing 450mm.”

      Other foundries have also accelerated their finFET roadmaps. For example, GlobalFoundries Inc. recently rolled out its finFET technology for the 14nm node. GlobalFoundries is taking a “modular fin” approach with its bulk finFET offering, dubbed 14nm-XM. The 14nm-XM combines a 14nm-class fin with its 20nm back-end-of-line (BEOL) interconnect flow.

      By taking the modular approach, the company has accelerated its process roadmap by a year. Early process design kits (PDKs) are available, with customer product tape-outs expected in 2013. Production, which is slated for 2014, will take place within GlobalFoundries’ new 300mm fab in New York.

      Another foundry vendor, United Microelectronics Corp. (UMC), is taking a similar modular finFET approach. UMC licensed finFET technology from IBM. Samsung Electronics Co. Ltd. has yet to elaborate on its finFET strategy.  Meanwhile, Intel Corp. is already ramping up its 22nm process, which is based on finFET transistors. Intel is providing foundry services for select customers, who plan to ship products based on finFETs.

       


      September 27 Report

      In my role, I serve as one of the members of the Global

      Semiconductor Alliance (GSA) Steering Committee on Intellectual Property, where we work to share best practices and continue to improve the IP ecosystem for the benefit of the entire semiconductor industry. As part of this role, I’ve observed a trend in the news speculating on the future of the foundry and IP industry, and I recently posted my thoughts on the GSA blog site, and I’d like to share them with you here as well.

      In 1897, after a journalist erroneously reported the passing of famed author and humorist Mark Twain, Twain replied in his typical wit with the now famous retort: “the rumor of my death has been greatly exaggerated.”  Like the then very alive author, recent reports have speculated on the demise of the foundry and IP business model.  I similarly think such talk is pure nonsense.  Across many metrics the foundry and IP space is alive and well and providing unprecedented capabilities to semiconductor companies. [his factual argumentation for that you can find much below, in the <<sticking with the “Goliath”>> section]

      Dr. John Heinlein, Vice President, Marketing, ARM Physical IP Division on May 16, 2012


      imageTSMC’s View of the Semiconductor IP Ecosystem 

      To understand the semiconductor IP ecosystem one should first understand it via the IP related efforts of far the biggest and most influential foundry, TSMC (as their success most heavily depends on a vibrant and quality IP ecosystem):

      ChipEstimate.com DAC 2012 IP Talks presenter Dan Kochpatcharin on TSMC OIP and IP Quality [chipestimate YouTube channel, June 26, 2012]

      Dan Kochpatcharin, Deputy Director, IP Portfolio Marketing, TSMC. IP Talks presenter with ChipEstimate.com at DAC 2012 in San Francisco. TSMC OIP (Open Innovation Platform alliance ecosystem) and IP Quality. For more information about TSMC , go to: http://www.chipestimate.com/tsmc/

      There are 41 IP partners in the semiconductor IP specific TSMC IP alliance program of TSMC OIP (Open Innovation Platform alliance ecosystem) and also have 20-25 IP partners directly supported but not part of the IP alliance program.

      image

      Among those the winners of the 2011 TSMC IP Partner Award of Year were:

      Note that for such an IP excellency the organisations behind are not big at all. Dolphin Integration SA is a 190 people company. eMemory employs around 200 people as per the award news release. While ARM Holdings Plc had 2,253 full-time employees alltogether at June 30, 2012, considering their Physical IP Division (PIPD) having just 11% of the overall revenue the number of employees there would probably not exceed 300. Artisan Components Inc. (US) acquired by ARM Holdings for not less than 1 billion US$ in Dec 2004 (because of “collaboration between the two companies on ARM’s next-generation MPU core, code-named “Tiger”, in 2005 becoming Cortex-A8) had 72 employees in 1997, so it is likely from historical point of view as well (considering even ARM’s heavy investment later on).

      As far as Synopsys is concerned, 9 months ago it had ~6800 employees, but its portfolio is rather large (implementation, verification, IP, manufacturing and FPGA solutions), and in addition to the Interface IP the company has Analog IP and Memories and Logic Libraries as well in the overall DesignWare IP portolio. To understand that split let’s take the following “Top Interface, Analog, and Embedded Memory IP Vendor” presentation slide from Synopsis Investor Day 2011 presentation, referring to a Gartner, March 2011 report, which is indicating $104.1M interface IP revenue for 2010:image
      which is ~ 7.5% of the overall revenue of Synopsis (having $1.38B for the fiscal year 2010 ending Oct 31, 2010 when it had 6707 employees) which could mean ~500 employees related to Interface IP activities taken proportionally to the revenue.

      And here are the number of titles in TSMC IP portfolio also vs. other foundries:

      image

      See also:
      TSMC Extends Open Innovation Platform™ [TSMC press release, June 7, 2010]
      TSMC Expands IP Alliance to Include Soft IP [TSMC press release, Oct 5, 2010]
      Atrenta and TSMC IP Quality Initiative Gains Broad Industry Acceptance [Atrenta press release, March 5, 2012]: “10 intellectual property (IP) providers have qualified their soft IP for inclusion in the TSMC 9000 IP library using the Atrenta IP Handoff Kit. Those companies, part of TSMC’s Soft-IP Alliance Program, include Arteris, Inc.; CEVA; Chips&Media, Inc.; Digital Media Professionals Inc. (DMP); Imagination Technologies; Intrinsic-ID; MIPS Technologies, Inc.; Sonics, Inc.; Tensilica, Inc.; and Vivante Corporation. The participating companies are able to provide quantitative information to TSMC’s customers regarding the robustness and completeness of their soft or synthesizable semiconductor IP that is part of the TSMC 9000 IP library.
      Imagination Technology Forum: Advanced SoC solutions in cooperation with TSMC [detailed DIGITIMES report, June 28, 2012]: “Not only will we be introducing our latest graphics processing IP, we will also talk about video, displays, multi-threaded cores [Meta SoC Processors], and wireless processors [Ensigma Universal Communications Core Processors (UCCPs)]. We hope that industries can further understand that Imagination is a company that provides complete SoC solutions.
      TSMC Open Innovation Platform® Ecosystem Forum, Technical Presentation Abstracts image[TSMC, Oct 18, 2011]
      ARM Physical IP Overview [ARM presentation, Sept 9, 2011]
      Leveraging Advanced Physical IP to Deliver Optimized SoC Implementations at 40nm and below [ARM presentation, Nov 19, 2010] [Meta SoC Processors]
      ARM Announces Processor Optimization Pack [ARM press release, Nov 9, 2010]

      ARM today announced the immediate availability of the ARM® Cortex™-A9 Processor Optimization Packs (“POPs”).  Processor Optimization Packs leverage ARM Artisan® physical IP to enable customers to achieve technology leading performance or power targets on their Cortex-A9 implementations in the shortest time to market. A silicon-proven POP is available now TSMC(R) 40nm G process technology.  The Cortex-A9 POP on TSMC 40nm LP process technology will be available to customers in January 2011.
      The Cortex-A9 Processor Optimization Packages contain three elements: ARM Artisan optimized logic and memory physical IP for a specific process technology, supported by implementation knowledge and ARM benchmarking.  Combined together the POP allows SoC designers to optimize Cortex-A9 designs for maximum performance, lowest power or to develop customized solutions balancing power and performance for their specific application.

      – Overall semiconductor IP market overview

      The key players listed by the market researcher MarketsandMarkets (with ChipEstimate.com links wherever possible, where “Prime IP Partners” are highlighted in bold) are the following companies:

      ARM Holdings Plc (UK)
      Atmel Corporation
      CAST Inc.
      CEVA Inc. (Israel, Choice IP Partner)
      Coreworks S.A. (Portugal), but see Homepage, Technologies, Products, Rapidity
      Dolphin Integration Inc.
      Imagination Technologies Inc.
      Lattice Semiconductor, but see its IP website
      Mentor Graphics, Inc.
      MIPS, Inc., but see Processor Cores, Interconnect IP, and MIPS Alliance
      MoSys, Inc., but see unparalleled bandwidth performance for next gen networking systems
      NXP Semiconductors N.V
      Rambus, Inc.
      Silicon Image, Inc.
      Synopsys, Inc.
      Tensilica, Inc. (Choice IP Partner)
      Triad Semiconductor, Inc., but see Mixed Signal ASIC, Engagement ModelIP Catalog, ARM Powered VCAs
      VeriSilicon, Inc. (Choice IP Partner)
      exited: Wipro-NewLogic, Inc., but see RivieraWaves (France) as a successor
       
      Notes:
      1. ChipEstimate.com Chip Planning Portal Overview
        The ChipEstimate.com chip planning portal is an ecosystem comprised of over 200 of the world’s largest semiconductor design and verification IP suppliers and foundries. These companies all share in the common vision of helping the worldwide electronics design community achieve greater profitability and success. To date, a diverse global audience of over 27,000 users has joined the ChipEstimate.com community and has collectively performed over 100,000 chip estimations. ChipEstimate.com is a property of Cadence Design Systems, Inc. (NASDAQ: CDNS), the leader in global electronic-design innovation.
      2. Reasons for missing Coreworks S.A, Lattice Semiconductor, Mentor Graphics, Inc., MIPS, Inc., and MoSys, Inc. on the ChipEstimate.com portal are quite diverse. You can find them via the additional linked explanations, typically marked as “but see”.

      Overall the summary of the Semiconductor Intellectual Property Market, Silicon IP Market (2012-2017): Global Forecasts & Analysis [MarketsandMarkets, April 2012] states that:

      The growth trend of the Semiconductor IP market revenue can be observed by the CAGRs over various time periods. The CAGR of the Semiconductor IP market from 1997 to 2002 was 17.82% while the value from 2002 to 2007 stood at 11.54%. Post 2007, the market again picked up growth and the forecasted CAGR from 2012 to 2017 is estimated to be 14.47%. In 2012, the global Semiconductor IP market is estimated to be $2.90 billion. The percentage share of Semiconductor IP industry in the global revenue for semiconductors was approximately between 0.3% and 1.0% over the years; stood at 0.71% in 2011, and is estimated to increase to 0.85% by the end of 2012 and 0.99% by the end of 2017.

      In the Analyst Briefing Presentation of the same report it is stated that:

      Coming to the statistics, in 2011, the global Silicon IP Market stood at $2.25 billion, while the global semiconductor industry revenue was at $315 billion. Both these markets are estimated to reach $2.90 billion and $340 billion respectively by the end of 2012.

      which means that while the global semiconductor industry is expected to grow just 6.3% this year the Semiconductor IP Market is estimated to grow by 28.9% ! So the latter is quite healthy although still a tiny part of the whole industry.

      Gartner presented last year the following, revenue based Semiconductor IP Market view:image
      Source: Synopsis Investor Day 2011 presentation, referring to a Gartner, March 2011 report

      Note that the $231.6M semiconductor IP revenue was just ~15% of the CY2010 overall revenue (~1.5B estimated at max) of Synopsis where Core EDA (Electronic Design Automation) was and is the bulk of the revenue by far: Core EDA revenue was $959M in FY2010 and $980.7M in FY2011. Relative to that the overall Semiconductor IP segment was and is a double digit growth area for Synopsis. Since the company is following a strong “M&A strategy to broaden TAM and provide incremental revenue growth” in non-Core EDA areas the semiconductor IP revenue will probably grow at the same pace in the coming years. Therefore its #2 position will be maintained on this market, especially as it has almost no competitors (only Mentor Graphics IP) among Top 10 (those companies having not less than 71.1% share of market), while the #3 Imagination Technologies’ strongest competitor is the #1 ARM Holdings, as well as the strongest competitor of the #4 MIPS Technologies is the same #1 ARM Holdings.

      So overall the market is quite mature, with well established and strong leaders already having the most of the business for themselves. The #1 ARM Holdings is also having a strong ecosystem of its own, which is providing opportunities for not less than 53 small silicon IP vendors outside the Top 10 as well. See: SoC IP [providers in ARM Connected Community Program].

      I’ve edited a more descriptive list of that in PDF, which you can download from here. Below I’am providing an excerpt from that, with strongest players in ARM’s own ecosystem in the sense of relying on ARM’s Artisan Physical IP via the IPNet Partner Program (denoted by +) and/or TSCM IP Alliance Program (denoted by *):

      Analog Bits*: the leading supplier of low-power, customizable analog IP for easy and reliable integration into modern CMOS digital chips. Our product range includes precision clocking macros such as PLL’s & DLL’s, programmable interconnect solutions such as multi-protocol SERDES/PMA and programmable I/O’s as well as specialized memories such as high-speed SRAMs and T-CAMs.
      Low Power Wide Range PLL – Common Platform 32LP
      Arteris*: Arteris invented Network on Chip technology, offering the world’s first commercial solution in 2006. Arteris connects the IP blocks in semiconductors from Qualcomm, Samsung, TI, and others, representing over 50 System on Chip devices. … Arteris is a private company backed by a group of international investors including ARM Holdings, Crescendo Ventures, DoCoMo Capital, Qualcomm Incorporated, Synopsys, TVM Capital, and Ventech.
      C2C™ Chip to Chip Link™ Inter-chip Connectivity IP
      FlexNoC Network-on-Chip Interconnect IP
      FlexWay Interconnect IP
      Aurora VLSI, Inc. +: provides AMBA specification-based SoC/ASIC IP components, peripherals, subsystems, and platforms. … Aurora provides a full set of popular communications and SoC IP cores for ARM and AMBA Bus-based SoCs.
      AMBA Peripherals- Ethernet, PCI, USB, IEEE1394, memory and flash controllers, interrupt controller, timers, counters, GPIOs, etc 
      AMBA SOC Platform (Configurable)
      AuthenTec*: a leading provider of mobile and network security. … AuthenTec’s products and technologies provide security on hundreds of millions of devices, and the Company has shipped more than 100 million fingerprint sensors for integration in a wide range of portable electronics including over 15 million mobile phones. Top tier customers include Alcatel-Lucent, Cisco, Fujitsu, HBO, HP, Lenovo, LG, Motorola, Nokia, Orange, Samsung, Sky, and Texas Instruments.
      SafeXcel™ IP-06 KASUMI Crypto Core Family
      SafeXcel™ IP-115 HDCP2 Content Protection Crypto Module
      SafeXcel™ IP-123 Secure Platform Crypto Module
      SafeXcel™ IP-154 Public Key Infrastructure Cores
      SafeXcel™ IP-16 3DES Crypto Core Family
      SafeXcel™ IP-160 MACsec Security Engine w/ Classifiers
      SafeXcel™ IP-18 CAMELLIA Crypto Core Family
      SafeXcel™ IP-197 Inline Security Packet Engine
      SafeXcel™ IP-28: Public Key Accelerator Cores
      SafeXcel™ IP-3X AES Crypto Core Family
      SafeXcel™ IP-46 SNOW 3G Crypto Core Family
      SafeXcel™ IP-48 ZUC Crypto Core Family
      SafeXcel™ IP-57 HASH/HMAC Core Family
      SafeXcel™ IP-60 MACsec Frame Engine
      SafeXcel™ IP-62 MACsec/IPsec GCM Packet Engine
      SafeXcel™ IP-76 True Random Number Generator
      SafeXcel™ IP-97 Look-Aside Security Packet Engine
      CEVA, Inc.*: the leading licensor of digital signal processor (DSP) cores, multimedia and storage platforms to leading semiconductor and electronics companies worldwide. … This portfolio includes a family of programmable DSP cores, DSP-based subsystems and application-specific platforms including multimedia, audio, Voice over Packet (VoP), Bluetooth, Serial ATA (SATA) and Serial Attached SCSI (SAS).
      Application Platforms: for Mobile Multimedia Applications
      The Only Silicon-proven Programmable Solution Supporting H.264 codec up to D1 resolution! … Complete, Low-Cost Audio Solution … Complete, Single Processor VoIP Solution
      DSP Cores: The CEVA-X family of cores is based on CEVA’s latest pioneering DSP architecture. This architecture offers best-in-class performance, scalability, and lowest cost-of-development for DSP deployment … CEVA-TeakLite Architecture DSP core.
      System Platforms: Broad set of DSP peripherals extendible through APB … tailored for specific cores of the CEVA-X architecture framework … High performance multimedia platform … CEVA-TeakLite Architecture DSP subsystems
      Chips&Media,Inc. *: video codec technologies cover the full line-up of video standards such as MPEG-2, MPEG-4, H.263, H.264/AVC and VC-1 from CIF to HD resolution.
      BODA7Series-HD Video Decoder IP
      BODA9Series-Dual HD Video Decoder IP
      CODA7Series-HD Video Codec IP
      CODA9Series-Dual HD Video Codec IP
      Denali Software, Inc. +: Databahn™ products provide optimal control and data throughput for external DRAM (DDR2, DDR3, LPDDR1, LPDDR2) and Flash memory devices.
      Databahn NAND Flash Controller
      Databahn(TM) PCI Express Controller IP Core
      Databahn(TM) SDR/DDR1/DDR2/DDR3/LPDDR2 Solutions
      eMemory Technology Inc. *: focused on the development of logic embedded non-volatile memory (NVM) such as OTP, MTP, and Flash. eMemory has published 186 patents. There are over 120 companies who have implemented our technologies and IP’s worldwide.
      NeoBit
      NeoFlash
      Intrinsic-ID *: semiconductor IP and embedded software products based on Hardware Intrinsic Security. Our solutions revolve around patented Physically Unclonable Function (PUF) technology, where a secret key is extracted like a silicon biometric or fingerprint from silicon hardware directly and only when required.
      Attackers have nothing to find because no key is stored nor present in the power down state. … Headquartered in Eindhoven, The Netherlands, Intrinsic-ID was founded in 2008 as a spin-out of Royal Philips Electronics and has been deployed in Philips’ production environment.
      AES
      HMAC-SHA-256
      iRNG
      Quiddikey™ in Hardware
      SHA-256
      Kilopass *
      XPM: embedded, one-time programmable (OTP) non-volatile memory (NVM). … Over 70 customers have integrated XPM™ in over 200 designs from 180nm to 40nm. Applications range from a few hundred bits for unique ID to prevent cloning to multiple instances of 1Mb for program code storage.
      PLDA, Inc. *: a leading provider of semiconductor intellectual property (IP) specialized in high-speed interconnect protocols and technologies.
      AMBA 2 AHB to PCI Bridge
      AMBA 2 AHB to PCI Express Bridge
      AMBA 2 AHB to USB 3.0 Device
      AMBA 2 AHB to USB 3.0 Host
      AMBA 3 AXI to PCI Express Bridge
      PCI Express IP Core with AXI interface
      Rambus Inc. *: one of the world’s premier technology licensing companies specializing in the invention and design of high-speed memory architectures.
      XDR Memory: architecture … proven in high-volume, cost-competitive applications. Operating at 3.2Gbps, XDR DRAM provides 6.4GB/s of peak memory bandwidth with a single, 2-byte wide device.
      Renesas Technology America, Inc. *
      Renesas Application Specific Products: SoC Architecture for Multimedia Controller Chip. Features: Multiple ARM 9 cores, Graphic Controller on chip, USB on chip, Memory Card Interface, Standard high-performance MCU peripherals, JTAG. Easy to customize with proven architecture and IP.
      Sidense Corp. *: Sidense Corp. provides secure, dense and reliable non-volatile, one-time programmable (OTP) memory IP for use in standard-logic CMOS processes, with no additional masks or process steps required and no impact on product yield. Sidense’s patented one-transistor 1T-Fuse™ architecture  provides the industry’s smallest footprint, most reliable and lowest power Logic Non-Volatile Memory IP solution and offers an alternative solution to Flash, mask ROM and eFuse in many applications.
      SiPROM
      SLP:
      ULP
      Silicon Image GmbH *+
      Multimedia Platform IP: complete system solutions for Mobile Communication including MPEG-4 Encoding and Decoding for video chat and video conferencing applications. For Multimedia the offering incudes solutions for DVD Players and Set Top Boxes. Other leading edge technologies include a broad portfolio of security IPs and IP cores of professional networking applications.
      Silicon Interfaces +
      Silicon Cores – Core to the Intelligent Systems(TM): 12+ IP cores targeted to areas such as Networking, Wireless, Communication and Interconnect, and around 5+ Verification IPs using Industry standard Verification Methodology
      Sonics, Inc. *+:  a pioneer of network-on-chip (NoC) technology and today offers SoC designers the largest portfolio of intelligent, on-chip communications solutions.
      MemMax AMP: an intelligent Dynamic Random Access Memory scheduler designed for use with any AMBA AXI compliant bus fabric and memory controller.
      MemMax Scheduler: an intelligent Dynamic Random Access Memory scheduler designed for use with an OCP compliant memory controller.
      SonicsGN: Sonics’ 4th generation, configurable, on-chip network enabling the design of advanced SoC communications networks using a high-speed scalable fabric topology structure. As the industry’s highest frequency NoC available today, SGN allows SoC designers to deliver high-performance, simultaneous application processing for smart phones, mobile video and tablets.
      SonicsLX: On-chip Network contains a high performance advanced fabric with data flow services for the development of complex SoCs.
      SonicsMX: an actively decoupled, non-blocking, intelligent internal interconnect that enables designers to implement multiprocessor SoC architectures using combinations of similar or heterogeneous processing elements.
      SonicsSX: On-chip Network contains a high performance, advanced fabric and a comprehensive set of data flow services for the development of complex, multicore and multi-subsystem SoCs.
      Synopsys *+:  world leader in electronic design automation (EDA), supplying the global electronics market with the software, intellectual property (IP) and services used in semiconductor design, verification and manufacturing. … Synopsys is headquartered in Mountain View, California, and has more than 70 offices located throughout North America, Europe, Japan, Asia and India.
      DesignWare Cores: Synopsys is a leading provider of high-quality, silicon-proven interface and analog IP solutions for system-on-chip designs. Synopsys’ broad IP portfolio delivers complete interface IP solutions consisting of controllers, PHY and verification IP for widely used protocols such as USB, PCI Express, DDR, SATA, Ethernet, HDMI and MIPI IP including 3G DigRF, CSI-2 and D-PHY. The analog IP family includes Analog-to-Digital Converters, Digital-to-Analog Converters, Audio Codecs, Video Analog Front-Ends, Touch Screen Controllers and more.
      DesignWare System-Level Library: a portfolio of tool-independent transaction-level models (TLMs) for the creation of virtual platforms. Virtual platforms are fully functional software models of complete embedded systems enabling pre-silicon software development and software-driven system validation.

      As one could there 18 silicon IP vendors with very strong (Artisan and/or TSMC IP Alliance) ties in ARM’s own ecosystem, and out of them 5 (AuthenTec, CEVA, Rambus, Silicon Image and Syopsys) are in the Top 10 group of providers.

      With that we could finish the overall semiconductor IP market overview.


      – The CEVA case

      A lot of Silicon IP vendors are highly focussed. Probably the most successful among them is CEVA Inc. (Israel, Choice IP Partner):

      CEVA DSP – Company Introduction [cevadsp YouTube channel, Aug 4, 2011]
      CEVA, Inc. Announces Second Quarter 2012 Financial Results [CEVA press release, July 31, 2012]
      … Total revenue for the second quarter of 2012 was $13.6 million, a decrease of 6% compared to $14.4 million for the second quarter of 2011. Licensing revenue for the second quarter of 2012 was $5.4 million, an increase of 3% compared to $5.2 million reported for the second quarter of 2011. Royalty revenue for the second quarter of 2012 was $7.6 million, compared to $8.3 million reported for the second quarter of 2011. Revenue from services for the second quarter of 2012 was $0.6 million, compared to $0.9 million reported for the second quarter of 2011.
      Gideon Wertheizer, Chief Executive Officer, stated: “The second quarter was the strongest licensing quarter in more than three and a half years, driven by a strategic licensing agreement with a tier 1 handset OEM for a range of LTE handsets and the first agreement for our newest DSP, the CEVA-XC4000 for LTE- Advanced. These latest agreements bring the total LTE design wins for CEVA DSPs to date to more than 20, and form the foundation for future royalty growth. Finally, while the competitive 2G market is experiencing pricing pressure, our volume growth in the lucrative 3G market during the quarter significantly outpaced that of the overall 3G space, as low and mid-range 3G smartphones gain traction.” …
      About CEVA, Inc.
      CEVA is the world’s leading licensor of silicon intellectual property (SIP) DSP cores and platform solutions for the mobile, portable and consumer electronics markets. CEVA’s IP portfolio includes comprehensive technologies for cellular baseband (2G / 3G / 4G), multimedia (HD video, Image Signal Processing (ISP) and HD audio), voice over packet (VoP), Bluetooth, Serial Attached SCSI (SAS) and Serial ATA (SATA). In 2011, CEVA’s IP was shipped in over 1 billion devices and powers handsets from every top handset OEM, including HTC, Huawei, LG, Motorola, Nokia, Samsung, Sony and ZTE. Today, more than 40% of handsets shipped worldwide are powered by a CEVA DSP core. For more information, visit www.ceva-dsp.com. Follow CEVA on twitter at www.twitter.com/cevadsp.
      LTE-A Ref.Architecture [part of the Ceva-XC4000 product page, Feb 20, 2012]
      CEVA-XC4000 multi-mode LTE-Advanced reference architecture
      Based on multiple CEVA-XC4000 processors, CEVA offers a complete multimode LTE-Advanced reference architecture targeting LTE-A Rel-10 Cat-7. The reference architecture was developed together with mimoOn, a member of the CEVA-XCnet partner program and addresses the entire PHY layer requirements.
      Reference architecture highlights:
        • A complete LTE PHY system architecture addressing the entire PHY layer requirements of multiple standards in software including: TD-LTE-A, HSPA+ Rel-9, TD-SCDMA, WiMAX and more
        • Built around CEVA-XC4000 processors with minimal complementary hardware accelerators
        • Offers industry’s most competitive SDR platform in terms of both cost and power consumption
        • Supports maximal throughput of LTE-A Rel-10 CAT-7 UE FDD (DL: 300Mbps, UL: 100Mbps) with up to 8×4 MIMO and carrier aggregation of up to two carrier components to a total of 40MHz channel
        • High operating margins enabling customer differentiation by software
        [See also the related press release, as well as the  CEVA Continues to Dominate DSP IP Market with 90% Market Share [May 14, 2012] press release]

        CEVA is also a best case for the trend determining the future of the semiconductor IP ecosystem, especially with the above “small print” example of a reusable LTE Advanced subsystem. More about the formation of such a trend you can find in the <<sticking with the “Goliath”>> section below.


        – When sticking with the “Goliath”: ARM Holdings Plc

        Then there are a number of vendors with an ecosystem of surrounding IP partners such as ARM Holdings Plc on the higher end (which we’ve already presented in the earlier, “Market Overview” section) and CAST Inc. on the lower one.

        Let’s examine the future of the semiconductor IP ecosystem through the eyes of these two companies. What they can offer strategically to their customers? Why customers are selecting the smaller and much less influential offerings from CAST against the “industry behemoth” ARM? What does it mean for a customer sticking with one against the other?

        Making IP work and getting the right SoC! [Global Semiconductor Alliance (GSA) Intellectual Property blog, July 18, 2012]

        Jack Browne, Vice President, Marketing, Sonics, Inc.

        Designers defining the next generation SoCs are adding more cores in pursuit of the ever increasing user experience. Whether for pacesetting smart phones, WiFi routers, or personal medical devices, making all this IP work as intended in the SoC requires system IP.  System IP includes the on-chip network, performance analysis tools, debug tools, power management and memory subsystems necessary for best in class SoCs. Whether used by the architect in the initial definition of the SoC or the layout engineer finalizing timing for place and route closure, system IP is critical to the design insuring that the capabilities of the SoC will meet the required end user experiences.

        For complex SoCs over 100 IP blocks may be included in a design.  Choices can be tough, with over a hundred IP vendors offering solutions, each with multiple products.  The System IP eases the design burden by supporting both IP blocks and subsystems with the necessary broad range of interface protocols, widths, frequency domains and power domains.

        System IP eases the challenges of maintaining a common software platform over multiple generations of SoC’s, built with varying IP cores and subsystems. Market research firm Semico, forecasts subsystem functions for computing, memory, video, communications, multimedia, security and system resource management. The increased abstraction from subsystems gives productivity benefit (leveraging use of commercial IP blocks) as well as differentiation through the integration of in-house IP blocks with standard industry IP blocks into reusable subsystems. A computing subsystem example would be ARM’s big.LITTLE CPU clusters where ARM does most of the integration ahead of time with the designer doing final configuration of features and/or number of coresAnother example would be faster communication subsystems like LTE advanced subsystems [we have already shown CEVA’s LTE-A Ref.Architecture above as the best example for that]. By customizing a 4G LTE advanced subsystem solution with internal technology, SoC design teams can differentiate from standard IP blocks using their internal expertise while leveraging the shared R&D benefits of merchant 4G IP subsystems.

        With the increasing cost of today’s SoCs, many are designed for multiple markets where not all of the functionality of the SoC is in use.  Many also have multiple usage scenarios within a given market, e.g. music playback on our smartphone. With the importance of battery life, managing the power of a SoC, including the ability to power off unused blocks, gives the best battery life.  Today’s 28nm SoCs are using dozens of power domains and even more clock domains to meet the performance and battery life requirements. By moving to system IP supporting hardware centric control of power transitions, end users will make more use of Dark Silicon (normally powered off) for better battery life as compared to interrupt centric software power management control.

        When starting a new SoC design, your choice of system IP is a key early decision as you have now selected the on-chip network, performance analysis tools, debug tools, power management and memory subsystems available for your design.  Making the right choice can provide a 2x benefit over other choices with regard to performance, power and cost, so make an informed choice.

        Foundry and IP Business Model: Alive and Well [Global Semiconductor Alliance (GSA) Intellectual Property blog, May 16, 2012]

        Dr. John Heinlein, Vice President, Marketing, ARM Physical IP Division

        … The IP ecosystem … is diverse and vibrant, with today’s IP providers offering many IP types, spanning a wide range of power, performance and area tradeoffs.  As an example, at 45 and 40nm various industry databases list between 450-620 licensable IP blocks available.  Furthermore, the latest IP developments at 45nm and 28nm include extensive power management capabilities, cost tradeoffs and implementation options that give designers choices for their chip.  Only through this ecosystem diversity can we have the rich and competitive landscape to address the many market segments the industry serves.

        Major technology investments are occurring across the foundry space, with new leading-edge R&D investments in fundamental process technology being made.  These investments span major companies like IBM, TSMC, Samsung, GLOBALFOUNDRIES, research consortia like IMEC and even new entrants like SuVolta, all of which are driving for aggressive technologies.  Today, 32 and 28nm products are in production and many more ramping to production.  Following that, there is a range of solutions already announced at 20nm that deliver the next node of planar bulk CMOS scaling.  Furthermore, the industry has clearly shown its commitment to investing in the next wave of 20nm and 14nm solutions beyond bulk ranging from FinFET to fully depleted SOI. …

        Clean Sweep at 28nm for ARM Artisan Physical IP [GSA Intellectual Property blog, Oct 11, 2011]

        John A. Ford, Director of Product Marketing, Physical IP Division, ARM

        On October 6th, UMC announced the selection of the ARM® Artisan® Physical IP Platform for the UMC foundry sponsored IP program. This new platform for UMC’s 28nm high-K metal gate (HKMG) process is a natural continuation of the long standing relationship between ARM physical IP division and UMC. ARM Artisan IP has been successfully used in millions of SoCs produced at UMC for more than 10 years on 180nm, 130nm, 90nm, 65nm and 55nm process technologies. The addition of UMC to ARM’s family of 28nm Physical IP platforms has a larger meaning than just a high quality set of IP on a technology-leading process. ARM Artisan IP is now the only physical IP platform available at all four of the 28nm commercial foundries in the world: TSMC, UMC, GLOBALFOUNDRIES, and Samsung.

        This makes good sense considering ARM’s expertise in physical IP optimization and years of establishing early foundry engagement on advance node IP development. ARM started work on physical IP for HKMG processes way back in 2008 with test chips and process qualification chips for IBM’s 32nmLP process. 32nmLP process was the first commercially available HKMG process and is now in high volume production at Samsung for smart phone, tablet and other applications. With millions of production SoCs at 32nm, 28nm is actually the 2nd generation of HKMG IP from ARM and includes all the critical design technique learning from 32nm development and production. ARM is deploying a full platform of standard cells, logic products, memory compilers and interface products at 28nm. Customers can benefit from being able to use consistent IP at all four foundries for the development of their SoC. With ARM’s exhaustive silicon validation process, customers have the assurance, peace of mind and confidence that only comes for using ARM IP.

        We’re not stopping there. ARM is now actively developing 20nm physical IP at both IBM and TSMC, with 5 test chips taped out starting in 2009 and several more planned for 2012 and 2013. By engaging early with foundries and developing IP in parallel with the process development, ARM ensures that designers can achieve the full entitlement of the technology, with a high degree of manufacturability. Foundries engage with ARM as a partner for early physical IP because of the long experience we have in developing physical IP on advanced process including CMOS SiON, CMOS HKMG and SOI. …

        ARM big LITTLE processing: Saving Power through heterogeneous multiprocessing and task content migration [chipestimate YouTube channel, June 18, 2012]

        Brian Jeff Product Manager at ARM. IP Talks speaker with ChipEstimate.com at DAC 2012 in San Francisco. ARM big LITTLE processing: Saving Power through heterogeneous multiprocessing and task content migration.
        From: Enabling Mobile Innovation with the Cortex™-A7 Processor [ARM whitepaper for TechCon 2011 by Brian Jeff, Oct 15 2011]
        Market requirements for high-end mobile
        High-end smartphones require high performance applications processors and graphics processors, but instantaneous performance requirements are highly elastic. During web browsing, for example, peak performance is required when pages are first rendered, but much lower levels of processor performance are required when reading or scrolling down a page. Similarly, applications have varying levels of performance requirements, typically requiring very high performance during launch, and low to moderate levels of required performance during at least some portion of runtime. For voice calls, the level of performance required by the applications processor is quite low, even on a high-end smartphone.
        Given the wide range of required performance, it would be ideal if the phone could use a very power efficient CPU some of the time, and migrate the context to a high performance CPU at other times. ARM has been researching this idea for several years, and has specifically designed the Cortex-A7 CPU not only to ideally fit all but the high-end performance requirements of a high-end smartphone, but also to be able to connect tightly with the larger and higher performance Cortex-A15 CPU in a coherent system. When connected together through AMBA Coherency Extension (ACE) interface a Cortex-A15 CPU cluster can be connected with a cluster of Cortex-A7 CPUs in a processor complex with a single memory map, hardware managed cache coherency, and the ability to run workloads on the large CPU cluster or small CPU cluster depending on instantaneous performance requirements. This concept created by ARM is called big.LITTLE processing.

        image

        big.LITTLE Processing
        Big.LITTLE refers to the coherent combination of High Performance and Power Efficient ARM CPUs A platform that contains both Cortex-A15 (big) and Cortex-A7 (LITTLE) can execute across a wider performance range with better energy efficiency than a single processor. Hardware coherency between Cortex-A15 and Cortex-A7 enables distinct big.LITTLE use models, either migrating context between the big and little clusters, or OS aware thread allocation to the appropriately sized CPU or CPUs. The CCI-400 cache coherent interconnect enables an extremely fast context migration between the big and little CPU clusters. Finally, software views the big and LITTLE CPU clusters identically, and transitions are managed automatically by OS power management or directly by the OS. The Net result of big.LITTLE power management is a platform with the peak performance of the Cortex-A15, and average power consumption closer to the Cortex-A7. This enables significantly higher performance at lower power than today’s high-end smartphones. The concept of big.LITTLE processing is only briefly introduced here; a more complete description of the ardware, software, and system implementation of big.LITTLE processing is covered in other TechCon resentations.
        From: Big.LITTLE Processing with ARM Cortex™-A15 & Cortex-A7 [ARM whitepaper by Peter Greenhalgh, Sept 15 2011]
        In general, there is a different ethos taken in the Cortex-A15 micro-architecture than with the Cortex-A7 micro-architecture.  When appropriate, Cortex-A15 trades off energy efficiency for performance, while Cortex-A7 will trade off performance for energy efficiency.  A good example of these micro-architectural trade-offs is in the level-2 cache design.  While a more area optimized approach would have been to share a single level-2 cache between Cortex-A15 and Cortex-A7 this part of the design can benefit from optimizations in favor of energy efficiency or performance.  As such Cortex-A15 and Cortex-A7 have integrated level-2 caches.
        Table 1 illustrates the difference in performance and energy between Cortex-A15 and Cortex-A7 across a variety of benchmarks and micro-benchmarks.  The first column describes the uplift in performance from Cortex-A7 to Cortex-A15, while the second column considers both the performance and power difference to show the improvement in energy efficiency from Cortex-A15 to Cortex-A7.  All measurements are on complete, frequency optimized layouts of Cortex-A15 and Cortex-A7 using the same cell and RAM libraries. All code that is executed on Cortex-A7 is compiled for Cortex-A15.
          Cortex-A15 vs Cortex-A7 Performance Cortex-A7 vs Cortex-A15 Energy Efficiency
        Dhrystone 1.9x 3.5x
        FDCT 2.3x 3.8x
        IMDCT 3.0x 3.0x
        MemCopy L1 1.9x 2.3x
        MemCopy L2 1.9x 3.4x

        Table 1 Cortex-A15 & Cortex-A7 Performance & Energy Comparison

        It should be observed from Table 1 that although Cortex-A7 is labeled the “LITTLE” processor its performance potential is considerable.  In fact, due to micro-architecture advances Cortex-A7 provides higher performance than current Cortex-A8 based implementations for a fraction of the power.  As such a significant amount of processing can remain on Cortex-A7 without resorting to Cortex-A15.

        big.LITTLE Task Migration Use Model
        In the big.LITTLE task migration use model the OS and applications only ever execute on Cortex-A15 or Cortex-A7 and never both processors at the same time.  This use-model is a natural extension to the Dynamic Voltage and Frequency Scaling (DVFS), operating points provided by current mobile platforms with a single application processor to allow the OS to match the performance of the platform to the performance required by the application.
        However, in a Cortex-A15-Cortex-A7 platform these operating points are applied both to Cortex-A15 and Cortex-A7.  When Cortex-A7 is executing the OS can tune the operating points as it would for an existing platform with a single applications processor.  Once Cortex-A7 is at its highest operating point if more performance is required a task migration can be invoked that picks up the OS and applications and moves them to Cortex-A15.
        This allows low and medium intensity applications to be executed on Cortex-A7 with better energy efficiency than Cortex-A15 can achieve while the high intensity applications that characterize today’s smartphones can execute on Cortex-A15.

        image

        An important consideration of a big.LITTLE system is the time it takes to migrate a task between the Cortex-A15 cluster and the Cortex-A7 cluster.  If it takes too long then it may become noticeable to the operating system and the system power may outweigh the benefit of task migration for some time.  Therefore, the Cortex-A15-Cortex-A7 system is designed to migrate in less than 20,000-cycles, or 20-microSeconds with processors operating at 1GHz.
        big.LITTLE MP Use Model
        Since a big.LITTLE system containing Cortex-A15 and Cortex-A7 is fully coherent through CCI-400 another logical use-model is to allow both Cortex-A15 and Cortex-A7 to be powered on and simultaneously executing code.  This is termed big.LITTLE MP, which is essentially Heterogeneous MultiProcessing.  Note that in this use model Cortex-A15 only needs to be powered on and simultaneously executing next to Cortex-A7 if there are threads that need that level of processing performance.  If not, only Cortex-A7 needs to be powered on.
        big.LITTLE MP is compelling because it enables threads to be executed on the processing resource that is most appropriate.  Compute intensive threads that require significant amounts of processing performance, as their output is user visible, can be allocated to Cortex-A15.  Threads that are I/O heavy or that do not produce a result that is time critical to the user can be executed on Cortex-A7.
        A simple example of a non-time critical thread is one associated with e-mail updates.  While web browsing the user will want email updates to continue, but it does not matter if they are done at CortexA15 performance levels or Cortex-A7 performance levels.  Since Cortex-A7 is a more energy efficient processor it makes more sense to take a LITTLE longer, but consume less battery life.
        Finally, as a fully coherent system can create a significant volume of coherent transactions, Cortex-A15, Cortex-A7 and CCI-400 have been designed to cope with worst case snooping scenarios.  This includes the case where a Mali™-T604 GPU is connected to one of the I/O coherent CCI-400 ports and every transaction is snooping Cortex-A15 and Cortex-A7 at the same time as Cortex-A15 and Cortex-A7 are snooping each other.
        From Combining large and small compute engines – ARM Cortex-A7 [by Brian Jeff on ARM SoC Design blog, Oct 19, 2011]
        The fourth and final thing is to ensure these engines work with a regular transmission.
        We needed to ensure there was a simple software approach to controlling the big.LITTLE switch consistent with power management mechanisms already in place. Current smartphones and tablet devices make use of Dynamic Voltage and Frequency Scaling (DVFS) and multiple idle modes for individual CPU cores and IP blocks in the application processor SoC. Our implementation of big.LITTLE modifies the back end of the driver which controls the processor’s DVFS operating point (for example cpu_freq in Linux/Android). Instead of three or four DVFS operating points, the driver now is aware of two CPU clusters each potentially with three or four independent voltage and frequency operating points, extending the range of performance tuning that existing smartphone power management solutions use. A big.LITTLE CPU cluster can be operated in a pure switching mode, where only one CPU cluster is active at a time under control of the DVFS driver, or a big.LITTLE heterogeneous multiprocessing mode where the OS is explicitly controlling the allocation of threads to the big or little CPU clusters and is thus aware of the presence of the different types of cores.

        ARM Cortex-A7 launch — Intro Simon Segars, President ARM Inc [US] [ARMflix YouTube channel, Oct 19, 2011]

        image

        ARM Cortex-A7 launch — Presentation, Mike Inglis, EVP & GM ARM Processor Division [ARMflix YouTube channel, Oct 19, 2011]

        The efficiency of the ARM architecture is the reason why ARM processors use less power and occupy a smaller footprint. The Cortex-A7 processor occupies less than 0.5mm2, using 28nm process technology, and provides compelling performance in both single and multicore configurations. Used as a stand-alone processor, the Cortex-A7 will deliver sub-$100 entry level smartphones in the 2013-2014 timeframe with an equivalent level of processing performance to today’s $500 high-end smartphones.

        image

        imageCortex-A7: Redefining Energy-Efficiency (DMIPS/mW)

        • Most energy-efficient applications processor
          ƒ- 5x the energy efficiency of mainstream phones
        • ƒPerformance to handle common workloads
          ƒ- >2x the performance of mainstream phone
        • Feature set and software compliant with Cortex-A15
          ƒ- Full backward compatibility
          ƒ- Scalable and extensible
        • Up to 20% more performance while consuming 60% less power

        From: Enabling Mobile Innovation with the Cortex™-A7 Processor [ARM whitepaper for TechCon 2011 by Brian Jeff, Oct 15 2011]
        The Cortex-A7 processor was designed primarily for power-efficiency and a small footprint. The design team based the pipeline on the extremely power efficient Cortex-A5 CPU, then added microarchitecture enhancements to increase performance and architectural enhancements to deliver full software compatibility with the Cortex-A15 CPU. These architectural enhancements include support for virtualization and 40-bit physical address space, and AMBA® 4 bus interfaces. Virtualization and large address space are unusual features for so small a CPU, but are critical to present a software view of the Cortex-A7 that is identical to the Cortex-A15 high-end CPU.
        Like the Cortex-A5, Cortex-A9, and Cortex-A8 processors that came before it, the Cortex-A7 processor is a full ARM v7A CPU, with support for the Thumb®-2 instruction set, optional 32-bit/64-bit floating point acceleration and optional NEON™ 128-bit SIMD architectural blocks. The Cortex-A7 also includes support for TrustZone® to enable secure operating modes which are increasingly important in modern mobile OEM designs. To bring higher scalability, the Cortex-A7 is also configurable as a multicore processor, supporting 1-4 cores in a coherent cluster.
        The Cortex-A7 is a simple in-order pipeline with significant but not complete dual-issue capability; however the careful choice of design features has enabled the performance of a single Cortex-A7 core to outperform the full dual-issue Cortex-A8 CPU on some important benchmark tests like web browsing, while consuming up to 60% less power.

        image

        Cortex-A7 Microarchitecture
        The roadmap below shows the legacy of Cortex-A class CPU designs, beginning with the Cortex-A8. In that design, ARM introduces the NEON SIMD architectural extension, and implemented a 2-way superscalar CPU that brought significant performance enhancements over the single-issue ARM11™. The Cortex-A9 extended the Cortex-A8 by bringing in MPCore capability for 1 to 4 CPU’s with cache coherency managed efficiently by a snoop control unit. The Cortex-A9 also introduced performance enhancements inside the core that brought a 20-30% performance increase over Cortex-A8 for a single core.
        image
        Cortex-A7 makes use of a simple 8-stage in-order pipeline, extended to include dual-issue capability on a reduced range of data-processing and branch instructions. Increased dual-issuing coupled with other microarchitectural improvements allow the Cortex-A7 to reach very good levels of performance with very low power consumption.
        image
        Other performance enhancing features include an integrated L2 cache, which reduces latency to L2 memory and external memory. The integrated L2 cache simplifies OS support as it uses system mapped registers and can be managed using CP15 operations rather than the memory mapped registers needed for an external L2 cache. Integrating the L2 cache controller also reduces the amount of area consumed by an external controller and enables a tighter integration of the controller with internal bus structures.
        The L2 cache controller itself was designed with low power in mind. The mechanism for looking up tags in the cache RAM includes consecutive tag followed by data lookup; similarly, the associativity is fixed at 8-way to balance performance against lookup energy. External requests are triggered on an L2 miss, rather than on speculative requests, to reduce energy.
        There are branch prediction improvements as well: the branch target instruction cache (BTIC) caches fetches after a direct branch and hides the branch shadow on tight loops.
        There are several improvements in memory system performance. The Load-Store path has been increased to 64-bits from the 32-bit path in the Cortex-A5. The external bus structure has been upgraded to 128-bit AMBA4 to improve bandwidth and introduce support for coherency extension beyond the 1-4 SMP cluster using AMBA 4 ACE.
        Energy Efficiency Features of the Microarchitecture
        There are several features of the L1 Memory system which reduce the power consumption of the CPU or the system. The merging Store-buffer after the write stage reduces data cache lookups. The 2-way set associative instruction cache trades off the slightly improved hit rate of a 4-way set associative cache for the reduced power on each lookup.
        Memory System Tuned to Minimize memory latency
        There are several performance optimizing features in the memory system. The address generation unit is shifted one stage back in the pipeline to enable a single cycle load-use penalty. The design team increased TLB size to 256 entries, up from 128 entries for the Cortex-A5 and Cortex-A9; this reduces page walks saving power and significantly improves performance for large workloads like web browsing with large data sets that span a large number of pages. Also, page tables entries can be cached in L1, improving the speed of page table walks on TLB misses. The bus interface unit has support for multiple outstanding read and write transactions. Finally, the physically indexed caches enable efficient OS Context switching.

        ARM Cortex-A7 launch — big.LITTLE demonstration, Nandan Nayampally, Director, Product Marketing [ARMflix YouTube channel, Oct 19, 2011]

        ARM Expands Processor Optimization Pack Solutions for TSMC 40nm and 28nm Process Variants [ARM press release, April 16, 2012]
        A Processor Optimization Pack solution is composed of three elements necessary to achieve an optimized ARM core implementation. First, it contains ARM Artisan® Physical IP logic libraries and memory instances that are specifically tuned for a given ARM core and process technology.
        This Physical IP is developed through a tightly coupled collaboration with ARM processor engineers in an iterative process to identify the optimal results. Second, it includes a comprehensive benchmarking report to document the exact conditions and results ARM achieved for the core implementation. Finally, it includes a POP Implementation Guide that details the methodology used to achieve the result, to enable the end customer to achieve the same implementation quickly and at low risk.
        “A single POP product can be applied to energy-efficient mobile, networking or even enterprise applications, providing a wide range of flexibility for ARM SoC partners to optimize performance and energy-efficiency while reducing risk in their designs,” said Simon Segars, executive vice president and general manager, Processor and Physical IP Division, ARM. “Only ARM can offer a complete roadmap of Processor Optimization Pack implementation solutions so deeply integrated and tightly aligned with ARM processor development activities now and into the future.”
        The summary below describes the existing and newly announced POP products for TSMC processes. ARM also incorporates the POP optimizations in hard macros of Cortex cores.
        POP availability by process technology
        TSMC 40LP
        TSMC 40 LP high speed options
        TSMC 40 G
        TSMC 28 HPM
        TSMC 28 HP
        ARM Cortex™-A5 Existing
        Cortex-A5
        New
             
        Cortex-A7
        New
        Cortex-A7
        New
         
        Cortex-A7
        New
         
        Cortex-A9 Existing
        Cortex-A9
        New
        Cortex-A9 Existing
        Cortex-A9
        New
        Cortex-A9
        New
             
        Cortex-A15
        New
        Cortex-A15
        Upcoming

        ARM Announces Cortex-A15 Quad-Core Hard Macro [ARM press release, April 17, 2012]
        Power-optimized implementation of quad-core hard macro on leading 28nm process
        ARM today announced the availability of a high performance, power-optimized quad-core hard macro implementation of its flagship ARM® Cortex™-A15 MPCore™ processor.  
        The ARM Cortex-A15 MP4 hard macro is designed to run at 2GHz and delivers performance in excess of 20,000DMIPS, while maintaining the power efficiency of the Cortex-A9 hard macro. The Cortex-A15 hard macro development is the result of the unique synergy arising from the combination of ARM Cortex processor IP, Artisan® physical IP, CoreLink™ systems IP and ARM integration capabilities, and utilizes the TSMC 28HPM process.
        The low leakage implementation, featuring integrated NEON™ SIMD technology and floating point (VFP), delivers an extremely competitive balance of performance and power and is ideal for wide array of high-performance computing applications for such as notebooks through to power-efficient, extreme performance-orientated network and enterprise devices. 
        The hard macro was developed using ARM Artisan 12-track libraries and the recently announced Processor Optimization Pack™ (POP) solution for the Cortex-A15 on TSMC 28nm HPM process. This follows the recent announcement of a broad suite of POPs for all Cortex-A series processors (see ARM Expands Processor Optimization Pack Solutions for TSMC 40nm and 28nm Process Variants, 16th April 2012)
        Full configuration and implementation details will be presented at the Cool Chips conference (18-20 April) in Yokohama, Japan. Further information is contained in an accompanying blog.
        “For SoC designers looking to make a trade-off between the flexibility offered by the traditional RTL-based SoC development strategy and a rapid time to market, with ensured, benchmarked power, performance and area, an ARM hard macro implementation is an ideal, cost-effective solution,” said Jim Nicholas, vice president of Marketing, processor division, ARM. “This new Cortex-A15 hard macro is an important addition to our portfolio and will enable a wider array of partners to leverage the outstanding capabilities of the Cortex-A15 processor.”
        See also:
        Squaring the circle – Optimizing power efficiency in a Cortex-A15 processor [Haydn Povey on SoC Design blog of ARM, April 17, 2012]
        Simplifying SoC’s with Hard Macros – New solutions for old problems [Haydn Povey on SoC Design blog of ARM, Oct 20, 2011]: “For me, the most important aspect of this talk was the public announcement of the availability of a new Cortex™-A5 Hard Macro for the TSMC 40nm Low Power node (40LP) which can achieve a whopping speed of over 1GHz in a tiny footprint of just 1mm2. … there will always be partners who need the full flexibility of RTL and POPs, but there is also a group for whom having a pre-integrated and hardened ready to run solution out of the box is the best route to market.”
        –  Hard Macro Processors [ARM product page, April 17, 2012]
        The ARM Hard Macro portfolio offers performance and power optimized hard macrocell implementations of the Cortex™-A series processors. For SoC designers looking to make a trade-off between the multifaceted flexibility offered by the traditional RTL based SoC development strategy and the significant costs and efforts it involves, the ARM Hard Macro portfolio is an exciting alternative that enables higher profitability through benchmarked PPA (Performance, Power, and Area), design risk reduction and faster time to market.

        ARM Hard Macros are available in a number of different implementation options with more being added.
        Currently the following options are available.
        Processor TSMC 40LP TSMC 40G TSMC 28HPM
        Cortex-A5 Single-core X    
        Cortex-A9 Dual-core   X  
        Cortex-A15 Quad-core     X
        Processor Optimization Pack™ (POP) solutions targeting ARM Cortex™ processors [ARMflix YouTube channel, April 16, 2012]
        ARM Artisan Physical IP Delivers Optimized Performance and Energy-Efficiency for ARM® Cortex™-A5, Cortex -A7, Cortex-A9 and Cortex-A15 cores.

        ARM Holdings Management Discusses Q2 2012 Results – Earnings Call Transcript [Seeking Alpha, July 25, 2012]

        If I look at physical IP, the story here is our physical IP is being used right across the different sectors that ARM’s processors are used in. We’re continuing with the processor optimization package activity. It was a record quarter for POPs. The best quarter we’ve had. So total of over 32 POPs sold now, still about a 50% attach rate with Cortex-A licensees, so that’s good in terms of generating royalty for the future.

        image

        [Note that here are only 13 companies shown out of those 32 POP licensees.]

        And also good in terms of generating royalty for the future is that this quarter, we had 4 new fabless semiconductor companies adopting ARM physical IP for their 28nm designs and beyond. So that is good for royalty growth going forward.

        Note: On the very first “Q2 2012 Highlights” slide one could see the following overall split:

        image

        The overall 77% share of processor division comprised of 31% licensing (the lighter blue)and 47% of royalties. So that is a pretty mature part of the business overall, although the Mail GPU part of it is still developing:

        Let’s — I should just highlight, we’ve got on the slide, of course, millions now of Mali devices as well, are going into those Cortex-A-based chips. And as far as Mali is concerned, then we are very much on track for the 100 million-plus units that we expect to deliver this year.

        as around 180 million Cortex-A units were shipped in the first half alone (see the graph in the next exerpt from the earnings call).

        The “Revenue Split Analysis” slide from the Appendix, however, is showing that due to the steadily growing application processor business (simply indicated Processor Division, PD) the share of the Physical IP business (simply indicated Physical IP Division, PIPD) was not growing for the last four years:

        image

        With extremely high interest in upcoming technologies of 28nm and beyond more and more Cortex licensees will (should) exploit the POP opportunity. Here is the low-end SoC market leader, MediaTek (Taiwan) example of its upcoming flagship products which should definitely use PoP as well for such a tight delivery schedule (considering the just 10 months availability of Cortex-A7 for licensing, i.e. ~15 months relative to Jan’13 SoC delivery vs. 2-3 years which were required previously):

        MediaTek a product roadmap leaked: Quad-core code-named MT6588 [MTK Smartphones Network (MTK手机网), July 27, 2012]
        Update: later was renamed and came to market as MediaTek MT6589 quad-core Cortex-A7 SoC with HSPA+ and TD-SCDMA is available for Android smartphones and tablets of Q1 delivery [this same blog, Dec 12, 2012]

        From a recently obtained electronic forum information abroad we see that the MT6585 code communicated earlier for the quad-core MediaTek smartphone chipset is wrong. The true model code is MT6588. It is built on the 28nm process in order achieve higher performance level than the dual-core MT6577 technology.
        MT6588 has a 4-core CPU [Cortex-A7 (!), see on the second slide below] clocked at 1GHz [1.XGHz rather, see the included slides below, as well the latest rumor about that being 1.7GHz or 1.5GHz], supports dual-channel at maximum 1066Mbps, has an integrated multimode modem for WCDMA [+ it is delivering HSPA+ WCDMA performance (!) vs just HSPA with MT6577/75, see the first slide below] and TD (!), that is it can support both Unicom [latest upgrade to HSPA+ service, see here] and China Mobile 3G network, supports an up to 13 MP camera and 1080P video playback. It finally has a GPU upgrade with SGX544, doubles the resolution to 1280×800 HD level, and has 32KB L1 cache and 1MB L2 secondary cache.
        Along the MT6588 there is a 28nm dual-core version, MT6583 on the MediaTek 2012 product roadmap. From the chipset parameters it is evident that MT6583 is a scaled down version of MT6588. It has 2 cores less, the camera support is 8MP, the video decoder is of 720P level, and the resolution is down to 854×480.
        It is understood that MT6588 and MT6583 will be in production in the first quarter of 2013, early next year the fastest.

        The MediaTek product roadmap

        MTK MT6588 chip Introduction

        MediaTek to launch quad-core smartphone solutions in 1Q13, says paper [DIGITIMES, Aug 6, 2012]

        MediaTek is expected to launch its first quad-core smartphone solution, the MT6588, in the first quarter of 2013, according to a Chinese-language Liberty Times report. The MT6588 features a quad-core 1.5GHz or 1.7GHz Cortex-A7 CPU, supporting WCDMA and TD-SCDMA technologies.

        The MT6588, which features a 13-megapixel camera, also supports 1080p video playback and a display resolution of 1280 by 800 pixels. The chip will be built using a 28nm process, the paper said.

        Additionally, MediaTek will also roll out a 28nm dual-core solution, the MT6583, during the same quarter. While the dual-core CPU of the MT6853 will also run at 1.5GHz or 1.7GHz, the chip will support a resolution of 854 by 480 pixels targeting a segment different from that of the MT6588, the paper indicated.

        Back to: ARM Holdings Management Discusses Q2 2012 Results – Earnings Call Transcript [Seeking Alpha, July 25, 2012]

        imageOne thing we are seeing is the value coming through in mobile, generally, the increasing number of smartphones, and within the smartphones themselves, an increasing number of Cortex-A products. And you can see a little histogram halfway down the slide, the top bar there is the ARM11. So ARM11 is still accounting for 40%, roughly, of the apps processors. And the Cortex-A is accounting for, roughly, 60% of the apps processors. But within that Cortex-A, you can see dual-core Cortex-A increasing significantly if you compare the situation with a year ago. And that’s good news from a value point of view for ARM as royalty, because typically these chips are more expensive. So single-core moving to dual-core and quad-core is a good trend for us. And note also, the underlying growth in sheer volume of our apps processors in smartphones. Don’t forget, with all this gloom and doom around, smartphones continues to be an area of significant growth for the business, and we’re looking forward to 30% thereabout growth in smartphones year-on-year so — for the year as a whole.
        ARM in MCU and Internet of Things
        imageGrowing standardisation around ARM in Microcontrollers
        – More than 100 companies have now licensed Cortex-M class processors mainly for microcontrollers, smart sensors and smartcards
        – Cortex-M0+ is ARM’s most energy efficient processor for microcontrollers
        Collectively, if you look at the line cards from the ARM partners, there are over 1,400 different ARM microcontroller products that you can go out and buy from ARM partners today. And that’s going to be a much bigger number by the time we’re all of that licensing that we’ve been doing gets into Silicon production.
        Earlier this year, we launched the Cortex-M0+ product … And again, at the Freescale technology forum, we saw an excellent demonstration of that power efficiency, where they literally had an ARM-powered charger, crank it up with a crank handle, charged a few capacitors up in the range of different microcontrollers and of course, the Cortex-M0+ went on and on and on. So that’s a great product.
        As far as the range of opportunities is concerned, it’s huge, and we’re starting to get design ins and as we start to get design ins, so more and more semiconductor companies are jumping onto the ARM-based microcontroller party. And they’re making these decisions in order to position themselves for the Internet of Things way.
        imageInternet of Things brings new opportunities
        – Combining radio technology with ARM-based microcontrollers and sensors
        – Huge range of applications, billions of opportunities
        – New products announced from Freescale, NXP and Toshiba in Q2

        In terms of volume shipments, at the moment then we saw another great quarter, where if we look year-on-year on microcontroller shipments up about 20% compared with industry shipments, up about 8%.

        Freescale: History & Future of “Internet of Things” – Design West (ESC) 2012 [ARMflix YouTube channel, March 28, 2012]
        Jim Trudeau, Solutions Technical Marketing from Freescale on the Cortex-M0+, the Internet of Things and Freescale’s Kinetis L Series
        See more: The Internet of Things, the ultimate mashup [Jim Trudeau on Software Meets Silicon blog of Freescale, April 17, 2012], published on ARM blog as “The Internet of Things, a Triad of Partners, and the Singularity of Change
        Implementing connectivity is where a company like Motomic Software comes into play. They bring Human Machine Interface (HMI) capability to a new arena. With connectedness comes the need for HMI to get smarter, to display what we really need to know when we need to know it in better ways. Take the lowly thermostat – as simple as its task, a traditional digital thermostat UI is typically confusing to use. A modern, simple UI in a “learning” thermostat can be quite simple. The contrast in complexity is startling as shown in Figure 1.
        Attached Image
        Figure 1: Contrasting Digital Thermostat UI
        Motomic Embedded Software Tools for IOT – Design West (ESC) 2012 [ARMflix YouTube channel, March 28, 2012]
        Motomic tells us about embedded software tools for applications focusing on Internet of Things, plus a demo of an embedded browser and media grid. http://www.motomicsoftware.com/
        See more: A Face for the Internet of Things [Mike Gee, CEO of Motomic Software, Inc. as a guest blogger on Embedded blog of ARM, June 11, 2012 ]
        … Motomic has created two browsers. Both browse and render HTML/CSS. Motomic’s µButterfly “microbrowser” runs in as little as ~320 KB Flash and 109 KB RAM. The Butterfly “minibrowser” is based on Qt, it supports features such as TrueType fonts, anti-aliasing and alpha blending. It requires 6+ MB of Flash. The RAM requirement depends on screen size and content requirements, starting around ~1 MB.
        Attached Image
        Both leverage the very low power requirements and very small footprints of ARM’s Cortex-M0+ and Cortex-M4 microprocessors that are too small to run a web browser such as WebKit, Chrome, Mozilla, etc. These small processors can now accurately render HTML/CSS content previously reserved for higher-end processors.
        Qt on Future’s WVGA display [MotomicSoftware YouTube channel, July 9, 2012]
        Nokia Qt for Freescale’s MQX real-time operating system on Kinetis K70 @ Future Electronics’ WVGA (800×480) PIM (Passive Intermodulation http://en.wikipedia.org/wiki/Intermodulation#Passive_Intermodulation) displays …. By adding Qt to MQX, you can: develop Qt-based applications for MQX, begin with the latest prebuilt, prevalidated, preintegrated Qt version, ready for your first deployment on one or more hardware platforms—you don’t need to build Qt, add splash screens with the world’s fastest animations, deploy Qt applications to your embedded devices automatically, leverage hardware optimizations and future-proof your hardware platforms. Motomic also lets you add media to MQX, for example advertisements or instruction videos. You can add social networking, games and browser functionality to your applications and products. Motomic helps you distribute your Qt application across networks.
        Development for the IoT is also being boosted by the Embedded Software Store. Motomic’s browsers and hundreds of other components for developing embedded software are accessible. Pre-built components allow solutions to be assembled more rapidly and with lower project risk. Complex systems can now be built rapidly by adding pre-built components.
        Innovative solutions like the Embedded Software Store (source of pre-built components for embedded developers), Motomic’s browsers, and ARM’s range of processors are allowing the creativity of developers to envision and build highly innovative solutions for the Internet of Things.
        ARM Embedded Software 2.0 [chipestimate YouTube channel, June 19, 2012]
        Will Tu, Director of Business Development at ARM. IP Talks speaker with ChipEstimate.com at DAC 2012 in San Francisco.
        See more:
        Advances in technology create new problems for today’s embedded developers [Will Tu on Software Enablement blog of ARM, Oct 12, 2011]
        Solving the Challenge of Software Complexity for Today’s Embedded Developer [Will Tu on Software Enablement blog of ARM, Oct 26, 2011]
        Avnet Electronics Marketing and ARM Launch Embedded Software Store [ARM press release, Oct 26, 2011]
        … Users can choose from a broad array of reputable embedded software vendors, including ARM, CMX Systems, Inc., DSP Concepts, Micrium, Motomic, YaSSL, and others. New software vendors are invited to join the initiative on an ongoing basis. The site also offers a quick download delivery system and preview of all license agreements in advance of purchase. Users are encouraged to participate in the Embedded Software Store’s online community to create a strong ecosystem of software support for ARM technology. … The site is fully operational and accessible at www.embeddedsoftwarestore.com

        AvnetEMA and ARM Launch Embedded Software Store [AvnetEMA YouTube channel, Nov 1, 2012]

        Watch a demo of the new Embedded Software Store

        Kinetis L Series & Energy Efficiency: FTF Keynote Demo [freescale YouTube channel, July 31, 2012]

        Freescale Debuts Kinetis L Series, World’s Most Energy-Efficient Microcontrollers [Freescale press release, Jun 19, 2012]

        Freescale Semiconductor (NYSE: FSL) is now offering alpha samples of its Kinetis L series, the industry’s first microcontrollers (MCUs) built on the ARM® Cortex™-M0+ processor. Kinetis L series devices are on display this week at the Freescale Technology Forum (FTF) Americas and were demonstrated during the event’s opening keynote address.
        As machine-to-machine communication expands and network connectivity becomes ubiquitous, many of today’s standalone, entry-level applications will require more intelligence and functionality. With the Kinetis L series, Freescale provides the ideal opportunity for users of legacy 8- and 16-bit architectures to migrate to 32-bit platforms and bring additional intelligence to everyday devices without increasing power consumption and cost or sacrificing space. Applications, such as small appliances, gaming accessories, portable medical systems, audio systems, smart meters, lighting and power control, can now leverage 32-bit capabilities and the scalability needed to expand future product lines – all at 8- and 16-bit price and power consumption levels.
        The ARM Cortex-M0+ processor consumes approximately one-third of the energy of any 8- or 16-bit processor available today, while delivering between two to 40 times more performance. The Kinetis L series supplements the energy efficiency of the core with the latest in low-power MCU platform design, operating modes and energy-saving peripherals. The result is an MCU that consumes just 50 µA/MHz* in very-low-power run (VLPR) mode and can rapidly wake from a reduced power state, process data and return to sleep, extending application battery life. These advantages are demonstrated in the FTF demo, which compares the energy-efficiency characteristics of the Kinetis L series against solutions from Freescale competitors in a CoreMark benchmark analysis.
        *Typical current at 25C, 3V supply, for Very Low Power Run at 4MHz core frequency, 1MHz bus frequency running code from flash with all peripherals off.
        Features common to the Kinetis L series families include:
          • 48 MHz ARM Cortex-M0+ core
          • High-speed 12/16-bit analog-to-digital converters
          • 12-bit digital-to-analog converters
          • High-speed analog comparators
          • Low-power touch sensing with wake-up on touch from reduced power states
          • Powerful timers for a broad range of applications including motor control
            The first three Kinetis L series families:
              • Kinetis L0 family – the entry point into the Kinetis L series. Includes eight to 32 KB of flash memory and ultra-small 4mm x 4mm QFN packages. Pin-compatible with the Freescale 8-bit S08P family. Software- and tool-compatible with all other Kinetis L series families.
              • Kinetis L1 family – with 32 to 256 KB of flash memory and additional communications and analog peripheral options. Compatible with the Kinetis K10 family.
              • Kinetis L2 family – adds USB 2.0 full-speed host/device/OTG. Compatible with the Kinetis K20 family.
                The Kinetis L series is pin- and software-compatible with the Kinetis K series (built on the ARM Cortex-M4 processor), providing a migration path to DSP performance and advanced feature integration.
                Availability and pricing
                Kinetis L series alpha samples are available now, with broad market sample and tool availability planned for Q3. Pricing starts at a suggested resale price of 49 cents (USD) in 10,000-unit quantities. The Freescale Freedom development platform is planned for Q3 availability at a suggested resale price of $12.95 (USD).
                For more information about Kinetis L series MCUs, visit www.freescale.com/Kinetis/Lseries.

                Kinetis L Series MCUs Built on the ARM Cortex-M0+ Core: What is the Plus For? [freescale YouTube channel, May 4, 2012]

                http://www.freescale.com/kinetis/lseries – This informative video will address what is new with the Kinetis L Series MCUs built on the ARM(R) Cortex(TM)-M0+ and what the plus really means.

                World’s Most Energy-efficient Processor From ARM Targets Low-Cost MCU, Sensor and Control Markets [ARM press release, March 13, 2012]

                RM today announced the ARM® Cortex™-M0+ processor, the world’s most energy-efficient microprocessor. The Cortex-M0+ processor has been optimized to deliver ultra low-power, low-cost MCUs for intelligent sensors and smart control systems in a broad range of applications including home appliances, white goods, medical monitoring, metering, lighting and power and motor control devices.
                The 32-bit Cortex-M0+ processor, the latest addition to the ARM Cortex processor family, consumes just 9µA/MHz on a low-cost 90nm LP process, around one third of the energy of any 8- or 16-bit processor available today, while delivering significantly higher performance.
                The Internet of Things will change the world as we know it, improving energy efficiency, safety, and convenience,” said Tom R. Halfhill, a senior analyst with The Linley Group and senior editor of Microprocessor Report. “Ubiquitous network connectivity is useful for almost everything – from adaptive room lighting and online video gaming to smart sensors and motor control. But it requires extremely low-cost, low-power processors that still can deliver good performance. The ARM Cortex-M0+ processor brings 32-bit horsepower to flyweight chips, and it will be suitable for a broad range of industrial and consumer applications.”
                The new processor builds on the successful low-power and silicon-proven Cortex-M0 processor which has been licensed more than 50 times by leading silicon vendors, and has been redesigned from the ground up to add a number of significant new features. These include single-cycle IO to speed access to GPIO and peripherals, improved debug and trace capability and a 2-stage pipeline to reduce the number of cycles per instruction (CPI) and improve Flash accesses, further reducing power consumption.
                The Cortex-M0+ processor takes advantage of the same easy-to-use, C friendly programmer’s model, and is binary compatible with existing Cortex-M0 processor tools and RTOS. Along with all Cortex-M series processors it enjoys full support from the ARM Cortex-M ecosystem and software compatibility enables simple migration to the higher-performance Cortex-M3 and Cortex-M4 processors.
                Early licensees of the Cortex-M0+ processor include Freescale and NXP Semiconductor. … The Cortex-M0+ processor is ideally suited for implementation with the Artisan® 7-track SC7 Ultra High Density Standard Cell Library and Power Management Kit (PMK) to fully capitalize on the ground-breaking low power features of the processor.
                The Cortex-M0+ processor is fully supported from launch by the ARM Keil™ Microcontroller Development Kit, which integrates the ARM compilation tools with the Keil µVision IDE and debugger. Widely acknowledged as the world’s most popular development environment for microcontrollers, MDK together with the ULINK family of debug adapters now supports the new trace features available in the Cortex-M0+ processor. By utilizing these tools, ARM Partners can take advantage of a tightly coupled application development environment to rapidly realize the performance and ultra low-power features of the Cortex-M0+ processor.
                The processor is also supported by third-party tool and RTOS vendors including CodeSourcery, Code Red, Express Logic, IAR Systems, Mentor Graphics, Micrium and SEGGER.

                Module 1: Kinetis-L Introduction and Overview of Features [AvnetEMA YouTube channel, Aug 3, 2012]

                Avnet Electronics Marketing presents a short overview of the ultra-low power, scaleable, feature-rich and easy — Cortex M0+ based Kinetis-L Series (32-bit ARM functionality with 8-bit ease-of-use)

                Module 2: Kinetis-L Ultra Low-Power Features [AvnetEMA YouTube channel, Aug 3, 2012]

                image

                More information:
                ARM Cortex-M0+: More than a low-power processor [Thomas Ensergueix on Embedded ARM blog, June 19, 2012]: “The Cortex-M0 MCU was quite unique when launched in 2009, offering a subtle mix of low-power, 32-bit performance and optimized code size, all of this packed in a very low gate count processor. … The new implementation of the very same ARMv6-M architecture with a 2-stage pipeline in Cortex-M0+ has given us 9% more performance while reducing the power consumption by around 30%.
                Introducing the ARM Cortex-M0+ processor: The Ultimate in Low Power [ARM whitepaper by Joseph Liu, May 4, 2012]
                ARM Cortex-M0+ Takes Flight on the Wings of Freescale’s Kinetis L Series [Danny Basler from Freescale as a guest partner blogger on Embedded ARM blog, March 14, 2012]
                FTF 2012 and Everything ARM [Drew Barbier on ARM Embedded blog, Aug 1, 2012]
                The Freedom Board [Erich Styger on Software Meets Silicon blog of Freescale, July 27, 2012]: “… my Freescale Kinetis L series Freedom board arrived. … The board will be available at Element 14/Farnell. It is expected to be publicly available by the end of September 2012, and you can pre-order now. The United States Element 14 site will have the board available for a suggested resale price of $12.95 (USD). In Europe it will be about 10 Euro. …
                Freescale ARM technology powerhouse in action [The Embedded Beat (all posts) blog of Freescale, June 19, 2012]: “Freescale has become an ARM technology powerhouse, offering the most unique and massively broad portfolio on the market today. It starts with our Kinetis portfolio, and the new Kinetis L series based on the ARM Cortex™-M0+ core, extends to the new Vybrid controller solutions [featuring a unique dual core ARM Cortex-A5 + Cortex-M4 architecture that handles both MCU and MPU tasks on a single chip] that enable rich apps in real time,  and stretches to the ultimate multimedia and display solution – the scalable i.MX 6 series [based on the ARM® Cortex™-A9 architecture].

                Continuing with the ARM Holdings Management Discusses Q2 2012 Results – Earnings Call Transcript [Seeking Alpha, July 25, 2012]

                We now have nearly 900 licenses, and so that continues to grow. The pool of licenses that are out there to generate royalties for the future. If I look at just quarter on its own, 23 licenses in total, collection of Cortex-A licenses, including our 12 big.LITTLE licensee. So we’ve now got 12 partners signed up for big.LITTLE. At the other end of this scale, the microcontroller end, I was just talking about the Internet of Things, yes, more licensing of our Cortex-M products.

                image
                And our new architecture, the v8 architecture, the 64-bit stuff, we’ve now got 9 v8 licensees, including the latest architecture licensee. And we’ve got this rather, it’s with — rather ill-defined horizontal axis of time going along the slide here. We are at the stage where we’ve done a lot of lead licensing now. We are approaching the first Silicon, the product launch type phase and so the 64-bit program is on track. And the interesting thing about our 64-bit architecture, it is not just about high-end computing and servers, it’s actually people talking about using it and the mobile as well, talking about using it in infrastructure applications, some of the networking applications that I talked about a moment or 2 ago.

                ARM in Networking and Servers

                • imageLeading networking companies choosing ARM processor technology
                  – Another v8 architecture licensee for intelligent networking applications
                  – Freescale announced their first ARM-based chip for infrastructure applications
                  – HiSilicon, LSI, TI and Xilinx have already announced ARM-based chips for networking
                … these smartphones, computers and everything, they have — they communicate and that communication means that they’re getting data from somewhere or they’re sending data somewhere. They’re sending over some data handling infrastructure. And the explosion in smartphones and more mobile computing and prevalence of the Internet is generating much more data. Some study suggests as much as 20x as much data over the sort of 10-year period from 2010 to 2020. And clearly, if that data is handled with the existing architecture, it’s going to consume 20x as much power, which is not a very sustainable situation. If you look at all the electricity generated in the world, then IT equipment accounts for about 10% of it, and if that is going to increase by a factor of 20, then we’ll going to have to build a lot more power stations. So that isn’t going to happen. People are going to look for more power efficient ways of designing this stuff, and here is the opportunity for ARM in networking. And so you see, as I mentioned a moment ago, a new v8 architecture licensee engaged in ARM in networking.
                Freescale, I wasn’t there, Freescale technology forum a few weeks ago. Freescale busy announcing their extensive networking product range, switching to adopt the ARM architecture. We’ve seen similar indications from HiSilicon, LSI, TI, Xilinx and so on. Everybody is realizing that in order to get more power efficient products here, then ARM is a great solution. imageAnd it’s the same power efficiency story, which is behind ARM’s activity in servers.
                • Servers bringing new opportunities
                  – Dell launches ARM-based server with 48 quad-core chips by Marvell
                  – Calxeda demonstrated 15x power/performance improvement
                  – Canonical announces server grade software for ARM-based chips

                ARM Holdings Management Discusses Q2 2012 Results – Earnings Call Transcript, Question-and-Answer Session [Seeking Alpha, July 25, 2012]

                Unknown Analyst … you’ve been talking about 64-bits sort of v8 architecture taping out relatively soon. Maybe you could — if you could give us a bit more details on what type of products would come on the market in the next 12 months for these 64-bit, if it’s only servers and other things.

                D. Warren A. East

                … On the second question, about 64-bits, then as I said in the presentation, it’s being used across a range of different applications, including mobile and computing. Servers is a very visible application area, where as we’ve said before, our penetration in the server market is limited until such time as we deploy 64-bit solutions. And I think it’s well known that one of our early 64-bit architecture licensees is targeting server applications and so probably, you’ll see that Silicon fairly early on. If we move along and move back.

                Unknown Analyst I think, Calxeda provided some interesting milestones this quarter in terms of the server progress. I’m just wondering, whether you can talk to how you feel the progress is going there in terms of actual sort of processing. Secondly, I just wondered whether — part of interesting slide just on the multi-core effect in the quarter, I just wondered, whether you have a sense of how much of your units shipped in mobile today is actually on quad-core based devices, versus dual-core, so the impact of quad-core presumably is still to come.
                D. Warren A. East
                Okay. On Calxeda and the server activity, I really don’t have anything else to say. We’re very pleased with the progress. The data that’s coming out suggests that all the experiments that we did before and all the simulation that we did before is being proven in Silicon. And bear in mind, this first Calxeda Silicon is actually Cortex-A9 based. And so I think I said Cortex-A9 was a core we developed very much with mobile in mind. Calxeda have added System-on-Chip infrastructure to turn into a server chip but it’s still a microprocessor core that was designed for mobile. When you put that server infrastructure around the microprocessor core that’s been a bit more designed with server applications in mind, like for instance, Cortex-A15, or moving onto v8, then you’re going to see even better performance at these levels of power consumption. But we’re very pleased with the data that’s come out so far. We’re also pleased to see other ARM Silicon partners starting to get a bit more public with their activity on the servers. The dual-core, quad-core, I don’t know that I can talk specifically about numbers, but I’ll just point you to shows like Mobile World Congress and CES, where what tends to happen is that you sort of have an announcement about products 1 year, and they turn into reality the next year. And we saw in the 2011 season, a load of dual-core devices being announced and they’ve now sort of materialized into phones. And it was about a year later at these shows that we saw the quad-core products announced and so we’d expect that sort of trajectory to continue. Over and above that, some people have gone a little bit further ahead with the quad-core and they’re using it as a sort of marketing tool and saying that the quad is better than dual. It’s a bit of a marketing thing. And it’s up to us semiconductor partners to see what performance they can actually — for what performance for a given level of power consumption they can actually achieve. We put it up on the slide as multi-core, and put the 2 together, because that’s really how we view it.

                Kai Korschelt – Deutsche Bank AG, Research Division
                … just on a like-for-like perspective, if you could remind us maybe of the potential royalty premium for a 64-bit versus 32-bit, please?

                D. Warren A. East

                … On 64-bit premium for — or sort of royalty premium for 64-bit, I mean this is a continuation of the trend we’ve been on for a while, where, basically, if there’s more value in the microprocessor, they royalty comes through with a higher rate. And we’ve talked about Cortex-A being sort of typically in the sort of 1.5% to 2% range, compared with preCortex-A being more in the sort of 1% to 1.5% range. And that trend will continue with our v8 architecture, so it’s going to be at the higher-end of that range.

                ARM Holdings Management Discusses Q2 2012 Results – Earnings Call Transcript [Seeking Alpha, July 25, 2012]

                64-bit, Physical IP and FinFET

                • TSMC and ARM announce collaboration to optimise ARM’s 64-bit processors and Physical IP and TSMC’s FinFET technology
                  – Optimization of ARM’s next generation processors and TSMC’s state of the art process technology
                  – Companies’ joint work will accelerate the adoption of SoC optimized FinFET technology
                  – Allows ARM’s and TSMC’s partners to develop market leading products for high-performance and low-power applications like mobile and enterprise

                Now looking ahead to a more leading edge technologies, as I said, we had an announcement earlier this week with TSMC, and this is ARM and the biggest independent semiconductor wafer fab or foundry company in the world getting together to actually continue work that’s been ongoing together for quite a long time, in terms of optimizing their process technology, working with physical IP division to optimize our physical IP on their new FinFET process, and using our new 64-bit processor as a vehicle for that development. So it’s world leading companies getting together to work from transistors right through its microprocessors to enable our joint partners to produce world leading products.

                ARM Holdings Management Discusses Q2 2012 Results – Earnings Call Transcript, Question-and-Answer Session [Seeking Alpha, July 25, 2012]

                Unknown Analyst

                … So on the FinFETs with TSMC, can you give us, maybe a bit more comments about this? How do you think it compares with Intel 3D, or whatever they call it? And how involved your PIPD team is involved trying [ph] to transistors characteristics, absorbs transistors? And also, I think the timing has been brought forward by 1 year, I think. So that’s the first question. …

                D. Warren A. East
                Dealing with the FinFETs first. A year or so ago, when Intel took technology, we said yes. So this is something which has been around in the semiconductor industry for the last decade or more. It’s one of the ways of making transistors more efficient, but it comes with a load of associated challenges that are actually making this stuff and making them yield and that sort holds back the semiconductor industry from taking that step. Intel took the step and announced that they’ve taken the step. They were the first ones over the gate, announcing that they were doing this. Of course, everybody else has been the same, researching it and playing with it for the best part of the last decade. And TSMC had their plans in place. They just were not choosing to go public on FinFET until they were choosing to go public. And we’ve been working with TSMC on their next-generation processes for some time. We always stood here and done presentations and talked about tape outs on 20nm, the first ARM tape out on 20nm was well over a year ago. We’ve taped out first 40nm designs already with some of these players and its R&D activity. As and when the foundry wants to make some of these things public, then they will, and that’s what TSMC have chosen to do this week. And they chose to, I guess, communicate particularly with their customers who are ARM partners by saying, “Not only are we doing some process development in the back room, but we’re also thinking about how you’re going to take this technology to market, the sort of products you’re going to built with it. You’re probably going to build ARM-based products with it, and so we’ve been working with ARM and ARM’s physical IP division to make sure that their physical IP, their microprocessors and our semiconductor process technology, works well together. And that’s all there is to it.”

                Janardan Menon – Liberum Capital Limited, Research Division
                Two questions. One is on the FinFET agreement with the TSMC, it’s on 64-bit. So I’m just wondering what plans you have on moving the 32-bit, Cortex-A15 kind of products to FinFET? DO you have another agreement with them which we don’t know about and will the timing of the introduction of that be roughly the same as the 64-bit signed?

                D. Warren A. East

                Okay. Well, let’s answer the first one. The FinFETs, yes, the announcement is, with our 64-bit processor because just as we want to work with TSMC’s most advanced process technology, they want to work with our most advanced microprocessor, making a 20nm FinFET and later, a 16nm FinFET implementation so that our 32-bit processors will form naturally out of that development activity. We’re optimizing our physical IP to build microprocessors. We just happen to be using our new 64-bit processor as the vehicle for it. The same physical IP will be very easily used to implement our 32-bit processors.
                Janardan Menon – Liberum Capital Limited, Research Division
                And with your — as part of the timescale of introductions, is that a 2014 introduction or is it ’15?
                D. Warren A. East
                Well, we have to stick with the announcements for now. And I think as and when TSMC want to make more comments on when these things are available, then they’ll make more comments. As I said, from a development point of view, we’re taping out stuff all the time. …
                Sumant Wahi – Redburn Partners LLP, Research Division
                … The second question has to do with the FinFET again. Am I doing — most of the foundries are sort of offering different known transition and in between, I assume, a FinFET would be, an option in between 20nm and probably 16nm. So my question really was that, would you be licensing FinFET technologies separately as well, or is this an exclusive collaboration with TSMC? And then is there a royalty increase coming from products based on FinFET, PIPD, so to speak? …
                D. Warren A. East
                Okay. Next question was about FinFET and whether it’s essentially a different physical IP product from ARM. And the answer is, well, it’s a different flavor. We have different flavors of our physical IP for each semiconductor process. And so a low-power version of a given note is a different physical IP bundle than a high-profile version. And the FinFET is another flavor again. So it would be an incremental licensing opportunity. But the fact that our physical IP is used, would generate the royalty opportunity. But it’s not an incremental royalty opportunity. The fact that it’s FinFET, it’s just another flavor. So if we’re going to have a 20nm low-power plainer flavor and the FinFET flavor, and the chips are going to be made out of one process technology, and so the royalty opportunity is the same. …

                ARM and TSMC Collaborate to Optimize Next-Generation 64-bit ARM Processors for FinFET Process Technology [ARM press release, July 23, 2012]

                TSMC (TWSE: 2330, NYSE: TSM) and ARM today announced a multi-year agreement extending their collaboration beyond 20-nanometer (nm) technology to deliver ARM processors on FinFET transistors, enabling the fabless industry to extend its market leadership in application processors.  The collaboration will optimize the next generation of 64-bit ARM® processors based on the ARMv8 architecture, ARM Artisan® physical intellectual property (IP), and TSMC’s FinFET process technology for use in mobile and enterprise markets that require both high performance and energy efficiency.  

                … The ARMv8 architecture extends ARM low-power leadership with a new energy-efficient 64-bit execution state to meet the performance demands of high-end mobile, enterprise and server applications. The 64-bit architecture has been designed specifically to enable energy-efficient implementations. Similarly, the 64-bit memory addressing and high-end performance are necessary to enable enterprise computing and network infrastructure that are fundamental for the mobile and cloud-computing markets. 
                TSMC’s FinFET process promises impressive speed and power improvements as well as leakage reduction.  All of these advantages overcome challenges that have become critical barriers to further scaling of advanced SoC technology.  ARM processors and physical IP will be able to leverage these attributes to maintain market leadership, while the companies’ mutual customers can benefit from these improvements for their new, innovative SoC designs. …

                ARM and TSMC Sign Long-Term Strategic Agreement [ARM press release, July 20, 2010]

                ARM and Taiwan Semiconductor Manufacturing Company, Ltd. (TWSE: 2330, NYSE: TSM) today jointly announced a long-term agreement that provides TSMC with access to a broad range of ARM processors and enables the development of ARM physical IP across TSMC technology nodes. This agreement supports the companies’ mutual customers to achieve optimized Systems-On-Chip (SoC) based on ARM processors and covers a wide range of process nodes extending down to 20nm. …

                ARM and TSMC Tape Out First 20nm ARM Cortex-A15 Multicore Processor [ARM press release, Oct 18, 2011]

                ARM and TSMC (TWSE: 2330, NYSE: TSM) today announced that they have taped out the first 20nm ARM® Cortex™-A15 MPCore™ processor. The two companies completed the implementation from RTL to tape out in six months using TSMC’s Open Innovation Platform® (OIP) 20nm design ecosystem.

                Building on this tape out, ARM will optimize its physical IP technology to specific TSMC 20nm process technologies for Power, Performance and Area (PPA), driving the specification of the Cortex-A15 Processor Optimization Pack (POP). TSMC’s 20nm process provides more than a 2X performance increase over preceding generations.

                FINFET: Has its time finally come for a sub – 20nm 3D device? [Jean Luc Pelloie Fellow Director of SOI Technology on the ARM SoC Design blog of ARM, Dec 21, 2011]

                … As we move to 20nm and beyond process technology, Fin-FET design may earn its place as the technology path of the future. … Fin-FET or tri-gate may be implemented on either bulk or SOI wafers.  … There is still work to be done, i.e. variability is expected to be different between SOI and bulk versions and needs to be quantified; … However, 3D devices are clearly on the road for sub-20nm nodes…and Fin-FET’s time may finally be here.

                Firms Rethink Fabless-Foundry Model [SemiMD (Semiconductor Manufacturing and Design), July 31, 2012]

                TSMC, for one, plans to accelerate its finFET efforts. Originally, TSMC planned to introduce finFETs at 14nm by late 2014. Now, the company has no plans to brand its finFETs at 14nm, but rather it will introduce the technology at 16nm. TSMC’s finFET “risk production” is slated for the end of 2013 or early 2014, with production scheduled for the second half of 2015, Chang said.

                Taiwan Semiconductor’s CEO Discusses Q2 2012 Results – Earnings Call Transcript [Seeking Alpha, July 19, 2012]

                … our 20 nanometer SoC, we believe, is fully competitive with industry leaders, other companies’ 22 nanometer for the served available markets that we serve. For our markets, we believe our 20 SoC is fully competitive with anyone’s 20 nanometer or 22 nanometer offering.
                And, one important point to make is that our 20 nanometer has the industry’s leading metal pitch of 64 nanometers. Our leading competitors have 80 nanometer metal pitch. That allows an advantage in the device’s density and die size.
                Now, as for the timing, we expect our 20 nanometer technology to be qualified by the end of this year and will be ready to support customers (inaudible) in Q1 of 2013.
                Now today, last time I mentioned that we will have a FinFET product after 20 SoC. And today, I’m glad to say that we have been planning the 16 nanometer FinFET. Right after our 20 nanometer (inaudible), which is the 20 SoC, we will offer FinFET at 16 nanometer for significant active power reduction. We expect to achieve speed and density, speed and logic density levels comparable to industry’s leading players 14 nanometer FinFET.
                So, we expect our 20 SoC to be competitive with competitors’ 22 nanometer or 20 nanometer products and we expect our 16 nanometer FinFET to be competitive with our competitors’ 14 nanometer FinFET products. You might ask why are we calling it 16. The only reason, in fact, until two days ago, we were undecided on whether to call it 14 or 16 FinFET. Now the only reason we decided to call it 16 FinFET is first, we want to be somewhat modest; second, we are told quite a few major customers ask the 16 FinFET, that designation and we didn’t want to confuse our customers by now switching to 14. But we expect it to be competitive with other people’s 14 nanometer offerings.
                Now 16 nanometer FinFET, our 16 nanometer FinFET, is expected to deliver about 25% speed gain given the same standby power over the 20 nanometer SoC. It is expected to give 25% to 30% power reduction at the same speed and the same standby power, and for mobile products, it is expected to give 10% to 20% speed gain at the same total power. As for timing, we expect it to be about one year after 20 SoC namely it should be ready for risk production at the end of 2013 or early 2014, about one year later than the 20 SoC.
                [from Q&A session]
                20-SoC which is 20-nanometer will ramp in 2014. And we believe that the 16 FinFET will ramp in, perhaps the second half of 2015. …

                – When sticking with a “David”: CAST Inc.

                Decreasing Risk When Selecting Third-Party Semiconductor IP (49th DAC) [castcores YouTube channel, July 17, 2012]

                In this presentation captured live at the 49th DAC (June 4, 2012), CAST president Hal Barbour describes ways electronic circuit and system designers can help ensure project success through careful selection of IP cores. Specific examples in the talk are drawn from CAST’s 18 years of semiconductor IP experience and include 8051 MCUs, H.264 and JPEG 2000 compression, and effective customer support for IP users. See more of CAST’s low-risk ASIC and FPGA IP product line and learn about the company at http://www.cast-inc.com. Or jump to these cores mentioned in the talk: • 8051 MCU – http://www.cast-inc.com/ip-cores/8051s/r8051xc2/index.html • H.264 Video – Encoder http://www.cast-inc.com/ip-cores/video/h264-mp-e/index.html • J2K Encoder – http://www.cast-inc.com/ip-cores/images/jpeg2k-e/index.html

                Leapfrogging The Competition Through Smart IP Selection [GSA Intellectual Property blog, March 30, 2012]

                Nikos Zervas, VP of Marketing, CAST, Inc.

                The adoption of a reliable design reuse methodology, proliferation of high-quality IP products, and shake-out of the most untrustworthy IP vendors creates a situation offering a huge potential advantage to system integrators and product designers looking to jump ahead of their competition.

                Instead of choosing the same big-vendor, star IP that most competitors may pick by default, smarter firms will seek out and commit to what might be technically-superior IP products from smaller vendors/partners who will offer both deeper and broader service and support.

                A good example is regarding microprocessors and controllers, the heart of most systems and usually the first, most critical system design choice.

                Consider a deeply embedded system that needs the power of a 32-bit processor. Much like that saying from the 1980′s that when choosing PCs “nobody gets fired for buying an IBM,” choosing a processor from the leading processor company is probably the easiest, safest choice, and it’s certainly an undeniably fine product with an extremely effective ecosystem. But making this choice might mean missing an opportunity for differentiation in a competitive market where every advantage is required for success.

                The IP portal sites list many 32-bit processor core options beyond the leading processor company, with Chip Estimate and Design and Reuse each returning nearly 300 results for such a search. More significantly, I count almost 30 different providers of these products. Certainly some of these vendors offer a product, support, or licensing terms—or perhaps even all three—that could give the smart designer a critical edge.

                Six of these stand out as being especially popular based on my recent visits with designers in California and Asia:

                • the AndesCore from Andes Technology,
                • the BA22 developed by Beyond Semiconductor and available from CAST, Inc. (disclosure: I work for CAST),
                • the ColdFire from IPextreme
                • the eSi-3250 from EnSilica,
                • the LEON3 from Aeroflex Gaisler, and
                • the MIPS 4KS and others from MIPS Technologies.
                How can you determine if options like these have sufficient benefits to outweigh the risk of not going with the leading processor company? Comparisons can be tricky, but there are a few key points to start with.
                The technical suitability and potential advantages of course depend on the detailed needs of your system. A good IP sales team will help you articulate the relevant characteristics of your project and make sure their product will work well before selling it to you.
                Quick comparisons of the performance and operating characteristics is made easier through the publication of well accepted power consumption and speed measures, like the CoreMark performance and CSiBC code density standards. Be sure, however, to look deeper to fully understand the specific configuration and technology details behind each vendor’s figures compared to that of your own target system.
                Ecosystems for programming and system development aids are a hot processor marketing topic. Be sure that the basics are covered: effective software programming tools such as the GNU tool chain, JTAG debugging, and ports of the RTOS or OS you want to use. A graphical IDE, support from tool vendors like Keil or Lauterbach, and eval/dev board kits are extras that can help further accelerate development.
                Licensing terms and actual costs can vary dramatically. For example, some vendors rely on royalty streams for their profits, while others have simpler up-front licensing fees with no royalties. What’s best for you depends on your specific product and market plans.
                Finally, credibility of the processor and the vendor are both crucial. For the former, look to successful use by other customers with applications similar to your own. For the latter, look for business longevity and general reputation, backed by your own experiences with the provider’s sales and engineering people. Try to extrapolate from a vendor’s pre-sale support how effective their integration help and other technical support services will be after you purchase from them.

                The examples of 32-bit processor alternatives I listed earlier all compare favorably with the leading processor company’s products in these factors; any might be the one to give you the extra technical, timeframe, or cost edge you need to make your product more competitive.

                The same is true of most other areas of semiconductor IP. Now that our industry embraces the use of third-party IP, the smartest designers will get a major payback from putting up-front effort into investigating the very best IP for their specific needs, whether that initially seems like the “safe” choice or not.

                (Note: all trademarks and registered trademarks mentioned here are the property of their respective owners.)
                About Nikos Zervas
                Nikos is the VP of Marketing for CAST, Inc. Before joining CAST in 2010, Nikos was a co-founder, chairman, and CEO of video/image SIP vendor Alma Technologies, SA [Pikermi, Greece]. He has been a member of the board for the Hellenic Silicon Industry Association since 2009, and he is a senior member of IEEE. Nikos holds BA and PhD degrees in Electrical and Computer Engineering from the University of Patras, Greece, and has published over forty papers in referenced journals and international conferences.
                Additional information:
                AndesCore™ from Andes Technology (founded in Taiwan in 2005) with AndeStar™ ISA:
                AndeStar is a patent-pending 16-bit/32-bit mixed-length instruction set to achieve optimal system performance, code density, and power efficiency.
                Freescale™ ColdFire Architecture IP
                Our extensive collection of ColdFire IP gives you the flexibility to choose the best solution for your cost/performance requirements while benefiting from the huge ecosystem of development resources available for the ColdFire architecture. Deployed in over 500 million devices worldwide, ColdFire is one of the world’s most widely-used 32-bit processor architectures. And the modern implementations of the ColdFire architecture, proven in devices from Freescale Semiconductor and available as synthesizable IP from IPextreme, provide performance and reliability that rival any similarly featured 32-bit processor IP.
                All ColdFire cores feature a variable-length RISC architecture for compact code and are supported by an extensive collection of development systems, tools, libraries, and operating systems from Freescale and several third-party commercial and open-source providers.
                Beyond BA22 Processor [Beyond Semiconductor web page, Dec 17, 2007] from privately held Slovenian fabless semiconductor IP company Beyond Semiconductor sold, supported, and built within platforms by CAST Inc. worldwide:
                Beyond BA22 Processor is the first implementation of Beyond BA2 Architecture processor. It’s main design goal was to minimize code size, gate and flip-flop count while obtaining similar performance as Beyond BA12 processor. The processor is extremely configurable, allowing for variety of size/performance trade-offs.
                Note: more Beyond BA22 related information is given later on as part of the CAST-related information
                eSi-3250 – 32-bit, high-performance CPU [EnSilica (UK) web page, Oct 11, 2009]
                EnSilica’s eSi-3250 CPU IP core is a high-performance processor ideal for integration into ASIC and/or FPGA designs with off-chip memories. The eSi-3250 is suited to a wide range of applications including running complex operating systems such as Linux.
                Scalability
                ImageFor applications that require do not require off-chip memory, the smaller eSi-3200 is available. For even simpler applications that do not require 32-bit performance or more than 64kB of memory, the eSi-1600 16-bit processor can be used. All of the eSi-RISC processors RTL and toolchains share a common code base, resulting in an easy migration path for both software and hardware developers, should the demands of an application change.
                LEON3 Processor [Aeroflex Gaisler (Sweden’s Gaisler acquired by US based Aeroflex) webpage, March 28, 2005]
                The LEON3 is a synthesisable VHDL model of a 32-bit processor compliant with the SPARC V8 architecture. The model is highly configurable, and particularly suitable for system-on-a-chip (SOC) designs. The full source code is available under the GNU GPL license, allowing free and unlimited use for research and education. LEON3 is also available under a low-cost commercial license, allowing it to be used in any commercial application to a fraction of the cost of comparable IP cores.
                MIPS32® 4KS™ Family [MIPS web page, Feb 28, 2003]
                The MIPS32® 4KSd™ secure data core is a high-performance processor that meets the needs of emerging secure data applications and the stringent power, security and size requirements for smart cards. This core has the performance required to implement software programmable cryptography without the need of a coprocessor, reducing SoC size and power consumption. The 4KSd core is the most secure, licensable, 32-bit processor available.
                End of additional information

                ChipEstimate.com DAC 2012 IP Talks presenter Nikos Zervas [chipestimate YouTube channel, June 21, 2012]

                Nikos Zervas, VP of Marketing, CAST. IP Talks presenter with ChipEstimate.com IP Talks at DAC 2012 in San Francisco. Leapfrogging Your Competition Through Smart IP Selection. For more information about CAST, go to: http://www.chipestimate.com/prime-partner/140/CAST-IP-Catalog

                Additional information:

                Meet Our New VP of Marketing [IP Notes from CAST, Inc., Sept 9, 2010]
                We’re very pleased to announce our new Vice President of Marketing, Nikos D. Zervas.
                Why did you join CAST?
                CAST has an industry reputation for being an IP vendor customers can really trust, with solid products and great support. Solving difficult technical challenges still excites me, of course, but my nine years working alongside CAST have shown me that having a passionate drive to help customers then earning the satisfaction of seeing those customers succeed can be just as rewarding.
                When the opportunity rose to join the impressive team at CAST, help grow the company, and further the ideal of easier design through IP, it seemed like the right time in my career for just such a move.
                What trends do you see for the IP market over the next year?
                Design reuse was become accepted for reducing risk and minimizing time to market. With this acceptance—and the fast-increasing rates of design complexity growth and design cycle shrinkage—I believe designers will move beyond specific functional cores to seek broader IP systems and complete solutions, like CAST’s recent H.264 Reference Design System. I think CAST is well positioned to supply this need, and that I can help them succeed with this next stage of growth.

                CAST Interview at DesignCon 2012 [castcores YouTube channel, Feb 14, 2012]

                EDACafe’s Graham Bell interviews CAST VP of marketing Nikos Zervas. Nikos discusses the new BA22 32-bit processor (http://bit.ly/ba22-32bit)—subject of a successful design seminar and booth demos at the show—as well as other recent new cores and the firm’s extreme commitment to customer support. Learn more about CAST at http://www.cat-inc.com.
                Fast JPEG Encoder Core from CAST Used in Fastec TS3 High-Speed Camera [CAST press release, March 6, 2012]
                Fastec Imaging Corporation has incorporated a JPEG Encoder IP Core from CAST, Inc. in its groundbreaking TS3™ line of handheld, high-speed digital cameras.
                ts3-shoot2-01 2 640x425.jpgSourced from long-time CAST partner Alma Technologies SA, the JPEG-E Encoder Core is one of the fastest-available baseline JPEG compression cores. This enables extremely competitive functionality for Fastec’s TS3 high-speed digital cameras, including capture of 1280 x 1024 pixel images at 500 frames per second, or 800 x 600 at 1,250 fps.
                “The quality of the core plus CAST’s ts3-shoot2-22.jpgdetermination to see us succeed were both instrumental in bringing our groundbreaking handheld high-speed camera, the TS3, to market on time and on spec.,” said Bob Sefton, principal FPGA design engineer at Fastec. “The JPEG encoder’s features and excellent performance were as specified, and the system integration was so easy I didn’t need CAST’s technical support services.”
                ts3-shoot2-302.jpgThe encoder core supports the Baseline Sequential DCT mode of the JPEG standard and is suitable for still-image or motion-JPEG capture. This third-generation core offers very fast JPEG compression—up to 750 MSamples/sec in a 65nm technology—yet is compact enough to fit low-cost FPGA devices.
                A bit-rate control option further benefits bandwidth-limited applications. “We envisioned demanding customer applications like Fastec’s when designing the JPEG encoder,” said Spyros Theoharis, vice president of products and technology at Alma Technologies. “It’s exciting to see yet another customer release of such a remarkable product using our technology and CAST’s support.”
                The JPEG-E core is part of a comprehensive family of image and video IP cores offered by CAST.

                A First look at the Fastec TS3 Camera [FastecImaging YouTube channel, May 30, 2012]

                TS3 slow motion nature footage shot at 720p at 718fps. Footage includes moving water, falling rocks and leaves in slow motion. Filmed and edited by Tom Guilmette.

                Butterflies caught on High Speed Camera [FastecImaging YouTube channel, May 30, 2012]

                Beautiful slow motion footage of colorful Butterflies caught on the Fastec TS3 High Speed Camera by Tom Guilmette at a local greenhouse

                Mentos and Diet Coke Geyser in Slow Motion [FastecImaging YouTube channel, July 6, 2012]

                Mentos and Diet Coke geyser shot with a Fastec TS3 100 High Speed Camera at 700 frames per second.
                The New Handheld TS3 100 High-Speed Camera [Fastec Imaging press release, July 10, 2012]
                Fastec Imaging, a leading global manufacturer of digital high-speed video cameras has, once again, taken the high-speed imaging world by storm with the release of the revolutionary new TS3 100 handheld high-speed camera. This portable, affordable, battery operated camera puts all the power of a high end, high-speed camera, in the palm of your hand!
                “We wanted to create a high-speed camera that was going to be easy to use, versatile and very portable, unlike many of the other cameras in this field,” explains Steve Ferrell, President of Fastec Imaging. “The TS3 combines the power, speed, resolution and light sensitivity of our renowned HiSpec camera line with the portability and ease of use of our previous handheld ‘point and shoot’ high speed cameras. The result is a completely portable and intuitive high-speed camera with the ease of use of a DSLR.”
                The TS3 100 captures 500 frames per second (fps) at 1280 x 1024 pixels and over 20,000 fps at reduced resolutions, making it the perfect high-speed camera for broadcast, research and industrial applications. Featuring a built-in 7’’ high resolution touchscreen LCD, the TS3 allows for instant playback of footage out in the field.  Combine that with an industry leading 4 hour battery, and it is easy to see why the TS3 100 is quickly becoming so popular.
                Unlike any other high speed camera on the market today, the TS3 100 offers unmatched versatility.  Not only is it an intuitive point- and-shoot handheld camera, but it can also be controlled over Gigabit Ethernet via a PC or MAC, or even over the Internet using a standard web browser for long distance control. The TS3 also features both USB ports and SD ports allowing users to easily download images to thumb drives, SD cards, or portable hard drives. Additionally, an optional built-in SSD, (Solid State Drive), provides for up to 256GB of non-volatile internal storage. This allows for shooting all day long without having to download to a computer.
                “The response to the TS3 has been overwhelming”, says Ferrell. “Its ease of use and affordability makes the TS3 one of the most accessible high-speed video cameras on the market and a perfect solution for researchers and manufacturers as well as TV and film producers.”
                For more information about the TS3 and other Fastec products, visit the web site at www.fastecimaging.com.

                Beyond BA22 Processor [Beyond Semiconductor webpage, Dec 17, 2007] from privately held Slovenian fabless semiconductor IP company Beyond Semiconductor:
                Beyond BA22 Processor is the first implementation of Beyond BA2 Architecture processor. It’s main design goal was to minimize code size, gate and flip-flop count while obtaining similar performance as Beyond BA12 processor. The processor is extremely configurable, allowing for variety of size/performance trade-offs.
                Embedded Processor Cores [Beyond Semiconductor webpage, May 7, 2007]

                ARM9™, ARM11™, ARM Cortex™-A9 and ARM Thumb®-2 are registered trademarks of ARM Holdings PLC.
                OpenRISC [Beyond Semiconductor webpage, Sept 1, 2007]
                Product Status – Obsolete
                OpenRISC was an open source hardware RISC CPU designed by Damjan Lampret, one of the contributors of OpenCores, released under the GNU Lesser General Public License. The OpenRISC OR1000 and OR1200 are no longer under active development, and are not recommended for new products.
                Beyond Semiconductor can provide commercial support for OR1000 and OR1200 processors.
                The Beyond BA12 Embedded Processor is an up-to-date, fully supported commercial version of OpenRISC, including many enhancements, integrated software development tool suite, development platforms and software debug tools.
                CAST and Beyond Semiconductor enter 32-bit Processor Core Partnership [joint press release, June 3, 2011]
                CAST to sell, support, and build platforms around the BA22 processor IP core from Beyond Semiconductor
                San Diego, CA – June 3, 2011, 48th DAC – Semiconductor intellectual property (IP) provider CAST, Inc. has reached an agreement with Beyond Semiconductor by which CAST will provide Beyond Semiconductor’s BA22 processor core worldwide.
                The BA22 is a fast, compact, power-saving, 32-bit RISC processor that CAST will offer without royalties. These capabilities plus easy development and integration features make the processor an excellent step up for CAST’s large base of 8-bit 8051 customers who need more processing power. In fact, the BA22’s programming code is so efficient that systems using it may require less silicon area than an 8051 with its respective code and memory.
                CAST will package the affordable BA22 with peripheral controllers and other essential IP. The initial focus is on deeply embedded systems; later platforms will exploit the processor’s scalability and performance potential to support broader applications.
                The platform approach gives customers a ready-to-use processor subsystem, and eases the transition to 32-bit processing for designers accustomed to similarly configured 8051 IP cores.
                The 8051 is still a good choice for many chips, but our experience with customers incorporating data-intensive functions like touch-based interfaces and high-res video makes it clear they really need a good 32-bit embedded processor,” said Bill Finch, CAST’s senior vice president for sales. “The silicon-proven BA22’s performance, tiny code footprint, and mature development tools make it a great choice for many new systems, while our 15 years of microprocessor IP experience and very attractive business model make CAST a great 32-bit processor provider.”
                “CAST has a long track record as a smart, effective, customer-focused IP team that makes them a perfect match for our products,” said Matjaz Breskvar, chief executive officer of Beyond Semiconductor. “Working with them will enable us to bring highly customizable Beyond BA22 to new designers across the world while providing ease of use and excellent customer support.”
                Limited availability of the BA22 from CAST begins now, with a full product roll out in the next quarter. IP integration services are also available.
                Learn more by visiting http://www.cast-inc.com/beyond or emailing beyond@cast-inc.com. Participants in the 48th DAC in San Diego, June 5–8, are welcome to stop by CAST’s booth (2217) to see a demo and discuss the advantages of the BA22.
                About Beyond Semiconductor
                Beyond Semiconductor is a privately held fabless semiconductor IP company. Its comprehensive product offering features 32-bit embedded RISC/DSP processors with the highest code density in the industry. For more information, visit http://www.beyondsemi.com.
                About CAST, Inc.
                CAST, Inc. is a privately held company that provides semiconductor IP products and services. The company features advanced image/video processing and microcontroller IP families, plus the memory controllers, high-speed buses, peripherals, and other functions needed to build complete systems. Learn more at http://www.cast-inc.com/.

                Background information:

                CAST IP for ASICs and FPGAs: Introduction and Overview [CAST presentation on SlideShare, July 2002], only images for certain slides are included below

                imageimage

                image

                imageimage

                imageimage

                BA22-AP: BA22 32-bit Application Processor [CAST datasheet, June 20, 2012]
                Implements a 32-bit RISC processor for demanding embedded applications that use offchip instruction and data memories and that may need to run a real-time operating system (RTOS) or a full operating system such as Linux or Android. Part of the royalty-free BA22 family, this processor core is extremely competitive in terms of high performance and low power consumption, and has best-in-class code density.
                The core has Instruction and Data Memory Management Units (MMUs) and Caches, dedicated buses for on-chip instructions and data memories, and an AMBA® AHB™ or Wishbone system bus interface. Optional floating point, divider and multiply–accumulate units benefit DSP applications. The core includes up to 32 general purpose registers (GPRs), a tick-timer (TTimer), a programmable interrupt controller (PIC), an advanced power management unit (PMU), and an optional debug unit (DBGU). Additional microcontroller peripherals may be ordered for pre-integration and delivery with the core, individually or in a complete platform. IP Integration Services are also available to help integrate any BA22 processor configuration with memory controllers, image compression, or other CAST IP cores.
                The processor’s BA2 instruction set is relatively simple and extremely compact. Programing is facilitated with the included C/C++ tool chain; Eclipse IDE; architectural simulator; and ported C libraries, RTOSs, and OSs.
                The BA22-AP synthesizes to 35k gates in a 90nm technology, can be clocked with more than 450MHz in a 65nm technology and provides as many as 1.59 DMIPS/MHz. The core is delivered, with a complete software development environment under Eclipse IDE, and its users get access to already ported real operating systems (Linux, Android, eCOS and uClinux) and libraries.
                The BA22 family of processors has been designed for easy reuse and integration, has been rigorously verified, and is production proven. Contact CAST Sales for details.
                Applications
                Internet, networking and telecom
                Portable and wireless
                Home entertainment consumer electronics
                Automotive

                Deliverables
                The core is available for ASICs in synthesizable HDL, and includes everything required for successful implementation:
                • Verilog RTL source code
                • Verilog Testbench
                • Silicon-proven Reference SoC/ASIC Design
                • Software development tools for Cygwin on Windows and Linux, with Eclipse IDE interface
                • Operating systems and board support package
                A reference design board running Linux and FPGA versions of the core are also available; contact CAST Sales for information.