abusedbits.com: Performance

Showing posts with label Performance. Show all posts

Monday, April 17, 2017

Networking Penalty Box

In networking, we can't always have both price and performance.

In many cases it is the equivalent of 'having the cake and eating it too.'

This is governed by the mechanism or method of network delivery, software or hardware.

In software, the cost of networking is relatively low and favors extremely rapid change.

It is important to remember that it is constrained by the software architecture as well as the queues and threads that must be processed in concert with the operating system, hypervisor and application, etc.

All of these are contending for time with the CPU and executing a software instruction takes a certain relative amount of time based on the CPU architecture.

In hardware, the cost of networking is high and favors rapid packet exchange over the ability to modify the networking function.

I'm being very generous in this statement, the sole purpose of hardware is to move the packet from one spot to another, as rapidly as possible.

Because the majority of the work is done in silicon, the only means to modify the network is to subroutine into software (which undermines the purpose and value of the hardware) OR to replace the silicon (which can take months to years and costs, a lot).

Figure 1. The price vs performance curve

Utilizing x86 generically programmable delivery mechanisms, it is possible to do many things required of the network that are tolerable, but not fast or optimized.

The examples of host bridges and OVS are eminently capable at relative bandwidth requirements and latency supporting an application within the confines of a hypervisor. It can be remarkably efficient at least with respect to the application requirements. The moment the traffic exits a hypervisor or OS, it becomes considerably more complex, particularly under high virtualization ratios.

Figure 2. The Network Penalty Box

Network chipset vendors, chipset developers and network infrastructure vendors have maintained the continuing escalation in performance by designing capability into silicon.

All the while, arguably, continuing to put a downward pressure on the cost per bit transferred.

Virtualization vendors, on the other hand, have rapidly introduced network functions to support their use cases.

At issue is the performance penalty for networking in x86 and where that performance penalty affects the network execution.

In general, there is a performance penalty for executing layer 3 routing using generic x86 instructions vs silicon in the neighborhood of 20-25x.

For L2 and L3 (plus encapsulation) networking in x86 instruction vs silicon, the impact imposed is higher, in the neighborhood of 60-100x.

This adds latency to a system we'd prefer not to have, especially with workload bandwidth shifting heavily in the East-West direction.

Worse, it consumes a portion of the CPU and memory of the host that could be used to support more workloads. The consumption is so unwieldy, bursty and application dependent that it becomes difficult to calculate the impact except in extremely narrow timeslices.

Enter virtio/SR-IOV/DPDK

The theory is, take network instructions that can be optimized and send them to the 'thing' that optimizes them.

Examples include libvert/virtio that evolve the para-virtualization of the network interface through driver optimization that can occur at the rate of change of software.

SR-IOV increases performance by taking a more direct route from the OS or hypervisor to the bus that supports the network interface via an abstraction layer. This provides a means for the direct offload of instructions to the network interface to provide more optimized execution.

DPDK creates a direct to hardware abstraction layer that may be called from the OS or hypervisor. Similarly offloading instructions for optimized execution in hardware.

What makes these particularly useful, from a networking perspective, is that elements or functions normally executed in the OS, hypervisor, switch, router, firewall, encryptor, encoder, decoder, etc., may now be moved into a physical interface for silicon based execution.

The cost of moving functions to the physical interface can be relatively small compared to putting these functions into a switch or router. The volumes and rate of change of a CPU, chipset or network interface card has been historically higher, making introduction faster.

Further, vendors of these cards and chipsets have practical reasons to support hardware offloads that favor their product over other vendors (or at the very least to remain competitive).

This means that network functions are moving closer to the hypervisor.

As the traditional network device vendors of switches, routers, load balancers, VPNs, etc., move to create Virtual Network Functions (VNFs) of their traditional business (in the form of virtual machines and containers) the abstractions to faster hardware execution will become ever more important.

This all, to avoid the Networking Penalty Box.

Friday, November 4, 2016

Computing, ROI and Performance - Value Chain

The surprising thing about Information Technology is the willingness to purchase equipment, service, licensing and managed services in step functions.

A combination of business and technology enhancements have made it possible to eke out additional value over time, but the greatest value almost always comes from a transformation technology change. The Transformational Technology change almost always changes the SIZE of the step function and who has to cover the GAP between ROI and PERFORMANCE.

Figure 1. The ROI and Performance GAP of the Step Function of acquisition

When purchasing ABOVE the curve, the ROI is in constant jeopardy until it meets the optimal units per cost line.

When purchasing BELOW the curve, the PERFORMANCE is in constant jeopardy until it meets the optimal units per cost line. Even then, it is almost always in jeopardy from business budgets and the fundamentals of running a business.

What is really interesting is how technology plays a particular role in changing the size of the step function. Where the depiction in Figure 2 is extremely idealized, the transformational impact is not.

Figure 2. Evolution of Computing Value Chain

Ascribing a single evolution event to the changes (and yes, there are more than one, but this is an idealized view), the changes in the industry have "almost always" lead to a transformation event. Each evolution change has, almost certainly, lead to a change in the units per cost step function, either explicitly a part of the acquisition cost or as part of an underlying cost associated directly with the mechanism of the step function.

As an example: Mainframe services moving toward Minicomputers was a size evolution. It had a direct impact on cost of acquisition and therefore reduced the step size. One didn't have to purchase an entire mainframe, one only had to purchase a Minicomputer.

BTW, Mainframe is still a very good business to be in, but the question needs to be asked, do you want your business in Mainframes?

Another example: Server to Virtualization was an asset optimization. It directly impacted the cost of managing workloads on servers as well as the capital to workload ratio. This affected the step function by increasing the utilization of computing assets and increasing the overall ratio of administrators to workloads.

Above MicroServices, the roadmap and view starts to get a little hazy. It looks like application functions will be decoupled and tied together with some type of API (Application to Platform) Chaining (or "Application Chaining"), sort of like the methods considered for Service Chaining in the Telecom industry for MANO.

While there may be some problems implementing that type of approach, ultimately it has become less and less about the hardware and more and more about the software running on the hardware.

It is expected that this trend will continue for the foreseeable future.

In the mean time, consider this: Purchasing a capability or service in the Utility area of a Value Chain, like Public Cloud is right now, will be overall less asset intensive and tied more readily to the unit of execution than building a computing infrastructure from the floor up. When the next transformational event hits, don't re-invent it, consider how to consume it as close to Utility as possible.

Friday, April 1, 2016

Network based APM

We've all been hearing quite a lot about Big Data and what the possibilities are associated with it. Ultimately, the discussion comes down to, how to get data and how is it made relevant.

Software Defined Networking is making it possible to separate the control plane from the data plane. The control plane information really isn't all that interesting, unless you're debugging an issue. But the data plane quite literally contains all of the relevant data associated not only with the application, but how it operates.

Assuming that the applications relevant data is also stored somewhere, Big Data would be able to produce interesting information from the stored data. But what about the operation of the application? User experience? SLA?

The scenario for elevating the value of the more transient aspects of network communications related specifically to an application have been around in networking for quite some time. At the most base level, packet counters exist on a host-by-host reading from within the connecting switch. Sflow and Netflow are more recent additions to look at the communications on the network. For about as long, and significantly more granular and expensive, SPAN or tap to a more granular and extensive analytics engine produce Application Performance information.

Consider this, with the addition of container based technologies, it could be possible to create a standardized mechanism for collection of developer based application awareness from within software defined networks.

Similarly, with sufficiently advanced and capable CPUs, it may also be possible to run the analytics from within the switching platform. We can table this for now, but the possibility is there.

The model could utilize a particular aspect of the developer's framework. Java or .Net that, from within the application, makes a call to a switch to start an analytics process.

With the application making calls specifically to the network infrastructure, it could possibly request the switch spin up a collector in a container on the switch. It could also make a call to the switch to determine which VNI is being used to transport it's data and tell the collector to extract the associated TCP information or deeper packet information from the application transport.

Once you have TCP or deeper packet data from the application, analytics could be applied to develop specific awareness of the application that could, potentially, become the basis for application performance. With the ensuing granularity of deeper inspection, both User Experience and SLA could potentially be derived from the mechanism. Not to mention, the possibility of extremely specific dev based tests on the application to determine things like fitness.

APM in an SDN network

Assuming the network was already established, in the simplest of these methods, the application would:

make an API call to the switch to determine which VNI it is associated with
make an API call to the Switch to spin up a collector
issue a TEST case for the collector to capture from within that VNI
send the captured data to an analytics engine to process for value

In more general testing, and assuming applicable path information, bandwidth and CPU capacity, the application would make a similar request to the control plane to issue the test case against its entire path. It could create capture on deeper packet information. Perform continuous Application Performance Management. Ultimately leading to extensive SLA level information about the application.