2019年12月12日 インテル株式会社 HPC事業開発部長 矢澤 克巳 HPC テクニカル・ソリューション・セールス カ 翠湖 ## 1 ST ERAIN HPC VERTICALLY INTEGRATED SYSTEMS PROPRIETARY HARDWARE AND SOFTWARE PARAGON 143 GFLOP DELTA 1970 1980 1990 2000 2010 MOSTLY BASED ON GENERAL PURPOSE CPU'S X86 LINUX OPEN STANDARDS BUILD TO ORDER SYSTEMS TIANHE-2 34PFLOP 10,000,000's 1,000,000's 100,000's 1000's 10's ASCI RED • 1 TFLOP 1970 1980 1990 2000 2010 2020 (intel # OF HPC SYSTEMS # EXASCALE NEXT ERA IN HPC DRIVEN BY INSATIABLE AI COMPUTE 1970 1980 1990 2000 2010 ## COMPUTE DEMOCRATIZATION TECHNOLOGY LED DISRUPTIONS #### 1 BILLION INTERNET **CONNECTED DEVICES** DIGITIZE EVERYTHING PC ERA NETWORK EVERYTHING 1980 1990 2000 2010 2020 2030 TECHNOLOGY LED DISRUPTIONS COMPUT $10^{18}$ $10^{15}$ $10^{9}$ $10^{4}$ $10^{2}$ #### 10 BILLION CLOUD CONNECTED DEVICES CLOUD EVERYTHING 1 BILLION INTERNET CONNECTED DEVICES PC ERA DIGITIZE EVERYTHING NETWORK EVERYTHING EVERYTHING MOBILE MOBILE + CLOUD ERA INTELLIGENCE ERA OMPUT $10^{18}$ $10^{15}$ 10<sup>9</sup> $10^{4}$ $10^{2}$ #### 10 BILLION CLOUD CONNECTED DEVICES 1 BILLION INTERNET **CONNECTED DEVICES** COMPUTE PC ERA DIGITIZE EVERYTHING NETWORK EVERYTHING TECHNOLOGY LED DISRUPTIONS EVERYTHING DEMOCRATIZATION MOBILE MOBILE + CLOUD ERA CLOUD EVERYTHING 2030 2040 1980 1990 2000 2010 ## COMPUTE COMPUT $10^{18}$ $10^{15}$ $10^{9}$ $10^{4}$ $10^{2}$ # EXASCALE REVERYONE ### HETEROGENEOUS TAXONOMY SCALAR VECTOR MATRIX SPATIAL #### TECHNOLOGY FOR EXASCALE NEED HUGE LEAP IN PERF/WATT AND PERF/MM^2 INTEL NEXT GEN 7nm PROCESS & FOVEROS<sup>TM</sup> PACKAGING COMPUTE DENSITY #### MEMORY FOR EXASCALE NEED HUGE LEAP IN BANDWIDTH/WATT & FOOTPRINT/MM^2 EMIB FOR HBM & FOVEROS<sup>TM</sup> FOR RAMBO CACHE MEMORY #### CONNECTIVITY FOR EXASCALE SCALE OUT TO MANY GPUS/NODE UNIFIED MEMORY CXL BASED CONNECTIVITY #### RELIABILITY FOR EXASCALE XEON<sup>TM</sup> CLASS RAS IN-FIELD REPAIR ECC, PARITY ACROSS ALL MEMORY AND CACHES CONNECTIVITY #### EXASCALE GPU ## インテル® Xeon® プロセッサーさらなる発展 ## 唯一、コンバージェンス向けに最適化されたデータセンター CPU インテル® アドバンスト・ベクトル・エクステンション 512 インテル® ディープラーニング・ブースト (インテル® DL ブースト) インテル® Optane™ DC パーシステント・メモリー 2019年 2020年 2021年 #### **COOPER LAKE** 14NM 次世代インテル® DL ブースト (BFLOAT16) #### **ICE LAKE** 10NM 現在サンプルを出荷開始 #### **SAPPHIRE RAPIDS** 次世代テクノロジー #### **CASCADE LAKE** 14NM AI の新たな加速化 (VNNI) メモリーストレージの新しい階層 業界最先端のパフォーマンス #### 第2世代 ## インテル®Xeon® スケーラブル・プロセッサー 価値 トップクラスの ワークロード・ パフォーマンス **画期的な** メモリー・ イノベーション 人工知能 アクセラレーション \ードウェア支援型 セキュリティー 俊敏性と使用率の #### 第 2 世代インテル® Xeon® スケーラブル・プロセッサーが実現する ## インフラストラクチャー全体にわたる AI 対応 新しい AI アクセラレーション インテル® ディープラーニング・ブースト VNNL Vector Neural Network Instruction 内蔵の推論アクセラレーション http://ai.intel.com/ (英語) 性能の測定結果は、構成に示した日付時点のテストに基づいています。また、現在公開中のすべてのセキュリティー・アップデートが適用されているとは限りません。構成とベンチマークの詳細は、スライド 52 ページに記載しています。 絶対的なセキュリティーを提供できる製品やコンポーネントはありません。 性能に関するテストに使用されるソフトウェアとワークロードは、性能がインテル®マイクロプロセッサー用に最適化されていることがあります。SYSmark\* や MobileMark\* などの性能テストは、特定のコンピューター・システム、コンポーネント、ソフトウェア、操作、機能に基づいて行ったものです。結果はこれらの要因によって異なります。製品の購入を検討される場合は、他の製品と組み合わせた場合の本製品の性能など、ほかの情報や性能テストも参考にして、パフォーマンスを総合的に評価することをお勧めします。 詳細については、http://www.intel.com/benchmarks/ (英語)を参照してください。 ## 新しいレベルのパフォーマンス インテル® Xeon® Platinum9200 プロセッサー 最大 パフォーマンス > ソケットあたり 最大 56 コア ラックあたりのパフォーマンス リーダーシップ 高帯域幅 最大 12 チャンネル ネイティブ DDR4 メモリー データ集約度の高いワークロード用に設計 ## インテル® oneAPI - インテル® oneAPI プロジェクトはさまざまなアーキテクチャーにわたる開発を容易にする統合プログラミング・モデルを提供 - データ並列 C++ 言語とインテル® oneAPI ライブラリー API により並列処理を表現 - 妥協のないパフォーマンス - CPU、GPU、AI、および FPGA をサポート - ・業界標準およびオープン仕様ベース - OpenMP\*、Fortran、MPI などとの相互運用性 ## oneAPI の目指すもの: 単一ベンダーソリューションの代替 - ・標準ベースのクロスアーキテクチャ言語 DPC++、C++ と SYCLがベース。 - ・キードメイン特化の機能を加速するため に設計された強力なAPI群 - ハードウェアの抽象レイヤをベンダーへ 提供する低レベルハードウェアインター フェース - オープンスタンダード、コミュニティへの 訴求、インダストリサポート - アーキテクチャとベンダーをまたがった コードの再利用 ## Intel® oneAPI Programming Guide(Beta) • https://software.intel.com/sites/default/files/oneAPIProgrammingGuide\_5.pdf Intel® oneAPI Programming Guide (Beta) Copyright © 2019 Intel Corporation All Rights Reserved #### Intel® oneAPI Programming Guide (Beta) A DPC++ program has the single source property, which means the host code and the device code can be placed in the same file so that the compiler treats them as the same compilation unit. This can potentially result in performance optimizations across the boundary between host and device code. The single source property differs from a programming model like OpenCL software technology where the host code and device code are typically in different files, and the host and device compiler are different entities, which means no optimization can occur between the host and device code boundary. Therefore, when scrutinizing a DPC++ program, the first step is to understand the delineation between host code and device code. To be more specific, DPC++ programs are delineated into different scopes similar to programming language scope, which is typically expressed via { and } in many languages. The three types of scope in a DPC++ program include: - Application scope Code that executes on the host - . Command group scope Code that acts as the interface between the host and device - Kernel scope Code that executes on the device In this example, command group scope comprises lines 45 through 54 and includes coordination and data passing operations required in the program to enact control and communication between the host ``` d_queue.submit([i](sycl::handler icgh) { auto c_res = c_device.get_access<sycl::access::mode::write>(cgh); auto a_in = a_device.get_access<sycl::access::mode::read>(cgh); auto b_in = b_device.get_access<sycl::access::mode::read>(cgh); cgh.parallel_for<class exl>(a_size,[=](sycl::id<l>idx) { c_res[idx] = a_in[idx] + b_in[idx]; }); } ``` Kernel scope, which is nested in the command group scope, comprises lines 50 to 52. Application scope consists of all the other lines not in command group or kernel scope. Syntactically, definitions are included from the top level include file; sycl.hpp and namespace declarations can be added for convenience. ## インテル® oneAPIの始め方 インテル® DevCloud、または、ローカル環境へのダウンロードの 2通りの方法がございます。 https://software.intel.com/en-us/oneapi #### インテル® DevCloud https://software.intel.com/enus/devcloud/oneapi ## intel DevCloud インテル® oneAPIツールキット(Beta)を使用して、かつ、インテルCPU、GPU、FPGAをまたがったワークロードを開発、テスト、実行するためのサンドボックス インテル® oneAPI ツールキット の使用 Data Parallel C++の学習 ワークロードの評価 ヘテロジニアスアプリケーション構築 プロジェクトのプロトタイピング No downloads | No hardware acquisition | No installation | No set-up & configuration Get up & running in seconds! # We are in a data-centric world All data must be stored, processed, and analyzed **6** ### 異なるデータ階層ごとの技術革新の必要性 #### 更なるNANDの階層化 144層 QLC 1024Gb (intel®) ## インテルのフローティング·ゲート·セルは 5 ビット / セルへと拡張 #### より高密度な実装へ: ビット / セル→ ビット / ダイ → ビット / ウエハー → ビット / SSD → ビット / ラック ## Highest Density Media Meets Highest Density Form Factor #### SOLUTION RANGE E1.L 18mm and 9.5mm E1.S, 5.9mm #### THERMAL EFFICIENCY E1.L 2x more thermally efficient than U.2 15mm<sup>2</sup> E1.S 3x more thermally efficient than U.2 7mm<sup>3</sup> #### CAPACITY SCALING E1.L up to 2.6x the capacity per 1 rack unit than U.2<sup>1</sup> E1.S up to 2x the capacity per drive than M.2<sup>4</sup> #### **FUTURE READY** PCIe\* 4.0 and 5.0 ready<sup>5</sup> ## What is Intel® Optane™ Technology? #### TRANSISTOR-LESS DESIGN Data is written at a bit level, so each cell's state can be changed from a 0 or 1 independently of other cells Intel® Optane™ Memory Media Intel® Optane™ Technology design is fundamentally different from NAND ### Intel® Optane™ DC Technology Products Coexist Improving Memory Capacity Improving Working Storage #### More to be Gained by being on memory Bus ランダムリード平均アイドルレインテシー with Intel® Optane™ DC Persistent memory ### Aerospike Certification Test (ACT) RESULTS Test read latency under 1 ms at high write pressure for key-value database usage #### **AEROSPIKE CERTIFICATION TEST FAILURE RATE @ 300K TPS** ALDERSTREAM VS CURRENT INTEL® SSDS (LOWER IS BETTER) 256 DEVICE READ LATENCY (microsecond - us) 512 1024 0 32 64 128 #### **Maximum TPS at <5% ACT Failure Rate** (HIGHER IS BETTER) ## Bottlenecks: The Nemesis of HPC Performance HPC Clusters are only as fast as their slowest component #### インテル® Optane™ DCパーシステント・メモリー メモリーおよびストレージのパラダイムの変更 #### HPC ストレージモデル #### AIストレージモデル FLOPS: 秒あたりバイト = 100,000: 1 #### Distributed Asynchronous Object Storage (DAOS:分散型非同期オブジェクト・ストレージ) #### A new open-source, highperformance storage software solution architected for DCPMM - Small I/Os are stored in Intel Optane DC persistent memory - Bulk I/Os go straight to the NVMe SSDs - Built entirely in userspace Low-latency, high-message-rate communications Collective operations & in-storage computing DAOS Storage Engine # Metadata, low-latency I/Os & indexing/query Memory Interface Persistent Memory Development Kit NVMe Interface Development Kit ### Distributed Asynchronous Object Storage (DAOS: 分散型非同期オブジェクト・ストレージ) コンバージド・モデル HPC STORAGE MODEL FLOPS: 秒あたりバイト = 10:1 ## GitHub にて利用可能: @ https://github.com/daos-stack/daos (英語) \* ソリューション概要は <u>www.intel.com/hpc</u> からダウンロード DAOS パブリック・ロードマップ: https://wiki.hpdd.intel.com/display/DC/Roadmap (英語) 「Argonne Leadership Computing Facility は、Aurora の一部である DAOS ストレージシステムが初めて本格的に本番環境展開するもので、2021年にアメリカで始まる初のエクサスケール・システムです。DAOS のストレージシステムは、エクサスケール・レベルのマシンで I/O の拡張的ワークロードに必要なメタデータの容量使用率のレベルを提供できるように設計されています。」 Susan Coghlan ALCF-X Project Director/Exascale Computing Systems Deputy Director #### Notices & Disclaimers Intel® technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com. Performance results are based on testing as of the dates shown in configurations and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark\* and MobileMark\*, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information, go to www.intel.com/benchmarks. Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel® microprocessors. These optimizations include SSE2 and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel® microprocessors. Certain optimizations not specific to Intel® microarchitecture are reserved for Intel® microprocessors. Please refer to the applicable product user and reference guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804. Cost reduction scenarios described are intended as examples of how a given Intel®-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel does not control or audit third-party benchmark data or the websites referenced in this document. You should visit the referenced website and confirm whether referenced data are accurate. © Intel Corporation. Intel, Xeon, Optane, AVX, and DL Boost are trademarks of Intel Corporation in the U.S. and/or other countries. \*Other names and brands may be claimed as property of others. ## THANK YOU! RAJA KODURI Business Forecast: Statements in this document that refer to Intel's plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel's results and plans is included in Intel's SEC filings, including the annual report on Form 10-K. ## APPENDIX A - INCREASING DENSITY: SSD FORM FACTOR INNOVATION Footnote 1. As measured by Office 365\* application launch with background activity (18 GB file copy). Configuration: CPU: Intel\* Core™ i5-8265U CPU (4 cores, 8 threads, 1.6 GHz base frequency, 3.9 GHz max turbo frequency) on HP Envy x360 2-in-1 15.6" 15M-DR0011DX (BIOS F.03) with Intel\* UHD 620 graphics and Intel\* Optane™ Memory H10 (512 GB) vs. AMD\* Ryzen 7 3700U CPU (4 cores, 8 threads, 2.3 GHz base frequency, 4.0 GHz max turbo frequency) on HP Envy x360 2-in-1 15.6" 15M-DS0012DX (BIOS F.07) with Radeon\* Vega 10 graphics and Toshiba XG5 (512 GB), both with 8 GB DDR4 RAM. Storage Driver: Intel\* Rapid Storage Technology 17.2.0.1009 for H10, Windows inbox driver for XG5. OS: Windows\* 10 RS5 Version 1809, Build 17763. MS Office 365 Version 1902 Build 11328. Footnote 2. As measured by Path of Exile\* game launch with background activity (18GB local file copy), comparing AMD Ryzen\* 7 3700U on HP Envy x360 2-in-1 15.6" 15M-DS0012DX (BIOS F.07) with Toshiba XG5 (TLC NAND SSD) 512GB vs. Intel\* Core i7-8565U on HP Envy x360 2-in-1 15.6" 15M-DR0012DX (BIOS F.03) with Toshiba XG5 512GB vs. Intel\* Core™ i7-8565U with Intel\* Optane™ Memory H10 512GB; all configs with 8GB RAM. H10 configs tested with RST driver 17.2.0.1009; XG5 configs with Windows inbox driver. All configs used Windows 10 Home 64-bit version 1809, build 17763. Path of Exile version 3.6.6c. Footnote 3. As measured by Photoshop CC 2019 and GIMP application launch with background activity (18 GB file copy). Configuration: CPU: Intel® Core™ i5-8265U CPU (4 cores, 8 threads, 1.6 GHz base frequency, 3.9 GHz max turbo frequency) on HP Envy x360 2-in-1 15.6" 15M-DR0011DX (BIOS F.03) with Intel® UHD 620 graphics and Intel® Optane™ Memory H10 (512 GB) vs. AMD® Ryzen 7 3700U CPU (4 cores, 8 threads, 2.3 GHz base frequency, 4.0 GHz max turbo frequency) on HP Envy x360 2-in-1 15.6" 15M-DS0012DX (BIOS F.07) with Radeon® Vega 10 graphics and Toshiba XG5 (512 GB), both with 8 GB DDR4 RAM. Storage Driver: Intel® Rapid Storage Technology 17.2.0.1009 for H10, Windows inbox driver for XG5. OS: Windows® 10 RS5 Version 1809, Build 17763. Photoshop CC 2019 Version 20.0.4. GIMP Version 2.10.8. Footnote 4. As measured by PCMark 10 Standard Benchmark (App Start-up score). Configuration: CPU: Intel® Core® i5-8265U CPU (4 cores, 8 threads, 1.6 GHz base frequency, 3.9 GHz max turbo frequency) on HP Envy x360 2-in-1 15.6" 15M-DR0011DX (BIOS F.03) with Intel® UHD 620 graphics and Intel® Optane® Memory H10 (512 GB) vs AMD® Ryzen 7 3700U CPU (4 cores, 8 threads, 2.3 GHz base frequency, 4.0 GHz max turbo frequency) on HP Envy x360 2-in-1 15.6" 15M-DS0012DX (BIOS F.07) with Radeon® Vega 10 graphics and Toshiba XG5 (512 GB), both with 8 GB DDR4 RAM. Storage Driver: Intel® Rapid Storage Technology 17.2.0.1009 for H10, Windows inbox driver for XG5. OS: Windows® 10 RS5 Version 1809, Build 17763. PCMark 10 GUI version 2.0.2115. SystemInfo version 5.18.705. System benchmarks version 1.1. Footnote 5. Based on Iometer testing at QD1 using a 4k random read scenario. Configuration: CPU: Intel® Core™ i5-8265U (on HP Envy x360 2-in-1 15.6™ 15M-DR0011DX, BIOS F.03) vs. AMD Ryzen\* 5 3500U (on HP Envy x360 2-in-1 15.6™ 15M-DS0011DX, BIOS F.07); storage Intel® Optane™ Memory H10 512GB vs. Toshiba XG5 512GB (TLC NAND); both with 8GB RAM. Storage Driver: Intel® Rapid Storage Technology 17.2.0.1009 for H10, Windows inbox driver for XG5. OS: Windows\* 10 Home 64-bit Version 1809, Build 17763. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information about performance and benchmark results, visit <a href="http://www.intel.com/benchmarks">http://www.intel.com/benchmarks</a> \*Other names and brands may be claimed as the property of others. ## **APPENDIX B** #### Intel is Leading the Way with NVM Technology - 1. 1st to 3Xnm (34nm)- https://phys.org/news/2009-07-intel-industry-nanometer-nand-solid-state.html - 2. 1st to 2Xnm (25nm)- https://www.intel.com/pressroom/archive/releases/2010/20100201comp.htm - 3. 1st 128GB with 1st integrated Hi-K Metal Gate Stack https://www.pcmag.com/article2/0,2817,2397287,00.asp - 4. Highest Density 3D NAND based on launch on March 26, 2015 comparing to other NAND die in production at that time <a href="https://newsroom.intel.com/news-releases/micron-and-intel-unveil-new-3d-nand-flash-memory/">https://newsroom.intel.com/news-releases/micron-and-intel-unveil-new-3d-nand-flash-memory/</a> - 5. Areal Density. Source IEEE. Comparing areal density of Intel measured data on 512 Gb Intel 3D NAND to representative competitors based on 2017 IEEE International Solid-State Circuits Conference papers citing Samsung Electronics and Western Digital/Toshiba die sizes for 64-stacked 3D NAND component. - 6. 1st to 64 layer TLC http://www.storagereview.com/intel shows off new tech ships 1st 64 layer 3d nand for data center - 7. Source: Intel. 1st PCIe\* Intel QLC 3D NAND SSD. Based on Intel achieving PRQ status of Intel® SSD D5-P4320 on 13 July 2018. - 8. 1st PCIe\* QLC SSD for Client <a href="https://www.tomshardware.com/reviews/intel-ssd-660p-qlc-nvme,5719.html">https://www.tomshardware.com/reviews/intel-ssd-660p-qlc-nvme,5719.html</a>