System modeling with libraries provides modeling and simulation software for the early architecture exploration of systems and semiconductors. VisualSim enables the architect to quickly trade-off configurations for optimum timing, throughput, power consumption, and functionality. The following are the major applications of VisualSim:
Memory controller design including the selection of the arbitration and the memory type
- Is it better to use Round-Robin or First Come-First Serve for the memory controller? If I use LPDDR4 vs LPDDR3, can I reduce the latency from 30 us to 10us?
- If I change the sequence of reading and write requests, can I reduce the power consumption of the SoC from 5W to 4W?
Know more: http://site_9d944a4a-81e8-447a-9623-4142fdcef88a/Resources/Articles/selecting%20memory%20controllers%20for%20DSP%20systems.pdf
Cache-Memory hierarchy with the association of L2 to many cores, need for an L3, sizing of the cache, and parameter configuration of each core
- If I use three levels of cache, can I increase the hot-ratio from 60% to 95%?
- If I use two levels of cache, can I reduce the request latency by 5us?
Know more: https://www.scitepress.org/Papers/2007/21422/21422.pdf
Bus technology selection such as Network-on-Chip vs AXI vs TileLink vs proprietary
- If I need to connect 60 devices to my NoC, what should be the flit size and the speed of each Router?
- Will the power consumption be reduced if I move to an AXI bus from NoC?
Know more: https://site_9d944a4a-81e8-447a-9623-4142fdcef88a/benefits-of-amba-axi-over-amba-ahb-for-display-systems/
Number of core cluster, sizing of the cores in each cluster, maximum and normal clocking, and internal cache sizing
- What will be the response time for 20 OS tasks and 10 user applications that are distributed across four ARM A72 cores?
- Can I split the applications between 2 ARM A53 and 2 ARM A72 cores?
- What is the difference in power consumption between options 1 and 2?
- Can I reduce the clock of the A73 from 600Mhz to 350MHz and get 20us response times for memory access?
Partitioning of applications and Operating System to cores
- What if put all the OS tasks on one core and put the user tasks on the remaining cores, will all cores have 65% utilization?
- If the applications are partitioned across 3 cores, can I still meet the latency requirement of 40us?
Know more: https://site_9d944a4a-81e8-447a-9623-4142fdcef88a/launchdemo/demo/HAL/MixedSoC/MultiSoC/
Hardware-Software partitioning of applications to determine the required number of accelerators, extending the cores with vector instructions or replacing them with alternate cores
- Why the software task is causing the highest latency? Can I create an accelerator for that task?
- Is it better to implement MPEG processing using vector instruction or develop a new accelerator?
Know more: https://site_9d944a4a-81e8-447a-9623-4142fdcef88a/launchdemo/demo/Partitioning/SoC/Power_Perf_example/
Assignment of time-sensitive functions such as Diagnostics, tracking, and other critical functions to micro-controllers and their location in the system
Know more: https://site_9d944a4a-81e8-447a-9623-4142fdcef88a/launchdemo/demo/automotive/Autosar/Autosar_WatchDog_Manager_ECU_Network/
Configuration of each device for speed, width, routing, capacity
Know more: https://archive.eetindia.co.in/www.eetindia.co.in/ART_8800686259_1800000_TA_276e0fdd.HTM
Study impact of IP cores on the system performance
- I have purchased an IP core for encryption. How overhead is this core adding to the AXI bus?
Know more: https://www.slideshare.net/DeepakShankar4/how-to-create-innovative-architecture-using-visualsim-60440035
To conduct the above trade-offs, there are a variety of system setup that is required
- Models required for this type of analysis must be quick to build and fast to modify
- The model that provides the ability to run concurrent tasks
- Generate traffic to emulate workload, interrupts, and events
- Quick modification of system parameters to run multiple explorations
- Large library of both existing and emerging technologies
Several modeling approaches are available to conduct these analyses. Some are analytical and others are dynamic. Some can be used prior to development and others can be used during development, or post development. Here are some different approaches.
- Microsoft Excel: The most common model to size the system is using Spreadsheets. The user enters the list of devices and associated states. Each application is associated with a set of states and devices. The latency and power consumed is the total of the state/device. There are several limitations to this approach. The first is that concurrent applications can not be evaluated. In today’s system, there are 50-100 concurrent tasks in the SoC. The second problem is that this focuses on the average and does not take into consideration queuing and dependency issues. The generated latency can be a guideline and the probability of occurrence range can be very large.
- C/C++/Python: The second approach is to use C++ or Python program. These perform a similar role to the spreadsheet but have slightly better accuracy because they can set up conditions. These suffer from the inability to experiment with concurrent applications.
- SystemC-based simulation platform: The third and common approach is to use SystemC or System Verilog. These provide the accuracy and the ability to create detailed models. They can match the RTL and can also be used for verification. Unfortunately, these models take a long time to develop and are available alongside the completed RTL. They cannot be used for early system specifications. Moreover, these models do not have sufficient probes to detect bottlenecks. Lastly, separate models need to be constructed for performance and power analysis.
VisualSim architecture exploration models are extremely fast to build using the huge library of components that can be configured to exactly meet the timing, power, and functionality of the proposed system. These models have over 500 probes that generate statistics on latency, throughput, power consumed, heat, Quality-of-Service, buffer usage, number of requests rejected, number of IO, hit-ratio, utilization of all resources and instant power and power per device. Most models can be built in a few weeks and explorations can start quickly after that.
Have a question? Or would like to know more? Reach out to us!
