Thanks for the post acanella. I think it is an important troubleshooting point so I have made a generic post for everyone:
To troubleshoot throughput bottlenecks, the first two likely causes will either be CPU or Disk:
1) For VM's test vCPU contention by reserving e.g. 2000 Mhz to the VM. Assign a minimum of 4 vCPU cores. Test whether throughput increases. Also switch to 'low' optimization on both ends to determine what the peak optimization throughput is. Generally 'low' should produce 5-10 times the throughput of 'high'.
2) If the extra cpu Mhz, cpu cores are not making a difference in 'high' and 'low' is working at a high throughput rate, then the disk is most likely the cause. If it is a USB appliance, look at mounting the datastore on a physical disk. If performance is key, use SSD drives. If USB is the only option, go with a quality USB-3. In virtual environments, try to keep the storage drive as close as possible to the VM. E.g. 20-30ms latency on network storage is definitely going to slow dedup down. This is because when there is a match, it has to read the data from the disk (More RAM means less disk access). In other words network storage might be ultra fast, but if the latency is high, throughput will be affected.
Also keep in mind that the receiving end might also be a bottleneck. For example if one side is configured with a latest Xeon with SSD drives but the receiving end is powered by a Celeron M 900 Mhz running the datastore on Compact Flash, it will hold throughput back, for sure.