Tag: kernel debugging

  • Windows 8 Kernel Debugging with KDNET and the Realtek RTL8168

    Existing Kernel Debugging Methods

    Until Windows 8 there were only five ways to kernel debug a Windows machine:

    1. Local Debugging (e.g. SysInternals LiveKD)
    2. Null-modem cable (a.k.a Serial or COM)
    3. IEEE 1394 (a.k.a. Firewire)
    4. USB 2.0
    5. VirtualKD

    Local debugging has serious limitations in that you can’t set breakpoints, view registers or stack traces, and are limited to debugging only those things that happen after the machine is fully booted and a user has logged in. Serial bugging has been a bread and butter method forever, supporting both hardware and virtual machines (via virtualized serial ports) but it’s really really slow. Firewire brought to the table a huge increase in speed over serial while offering fairly straight forward hardware requirements. Firewire virtualization isn’t possible with VMware (that I’m aware of anyways) and Firewire has steadily been dropped from a lot of OEM PCs in favor of USB 2.0 and more recently USB 3.0. Debugging over USB 2.0 is a bit of a unicorn. It’s technically possible to do it, but it requires a specific port on the USB controller that may or may not be available (e.g. it may be wired internally to a webcam or card reader) and a relatively expensive (~$100 USD) USB 2.0 debug cable such as the NET20DC. In practice, USB 2.0 kernel debugging is a gauntlet best avoided if at all possible. SysProg’s VirtualKD is an awesome framework. It has it’s quirks but it’s steadily improved over time. It’s by far the fastest kernel debugging transports in my experience and it’s a snap to install and configure. It’s biggest drawbacks are that it only supports virtual machines and you have to install a driver package (which may not amenable to the machine owner for various reasons).

    Windows 8 Introduces Kernel Debugging over a Network Cable

    Serial ports and Firewire were disappearing from a significant portion of PCs (laptops, workstations, and servers alike) and there are significant issues with using the USB 2.0 kernel debug transport. Microsoft needed a new efficient and effective means of supplying kernel debug transports to the IHVs and ISVs of the world. With Windows 8 and it’s kernel cousin, Server 2012, Microsoft introduced to the public two more kernel debug transports – Network and USB 3.0. Along with these, Microsoft also introduced new logo certification requirements (WHQL Debug Capability) that required system integrators to pass a series of tests to demonstrate that the system was in fact kernel debuggable. There is a great Channel 9 video released by the team here. It is my understanding that to be logo certified, you don’t have to support any one specific debugging transport, you just have to demonstrate one of them. The video states that one could even satisfy this requirement by using a custom hardware interface and debug cable as long as the hardware and any kernel transport drivers were available to end-users.

    Microsoft implemented kernel transport drivers for the two most likely vehicles for providing this: Network and USB 3.0. Network interfaces already are and will continue to be prolific for the foreseeable future. USB 3.0 will continue to increase it’s market saturation and addresses the biggest issues with USB 2.0:

    • You can use any USB 3.0 port to debug
    • The only cable you need is a relatively inexpensive (~$15 USD) USB 3.0 A-A debug cable (i.e. no Vbus pins).
    • Chipset support is standardized via xHCI which is supported by most hardware shipping today

    These are not the NICs you are looking for and other Jedi Knight tricks

    I recently received a new (Jan 2013) Dell XPS 15 L521x laptop that I needed to develop driver support for. These machines don’t have a serial or Firewire ports and I’m not about to screw around with USB 2.0 debugging. They do however, have both a gigabit NIC and a USB 3.0 controller. Furthermore, they are logo certified, so they must have passed the WHQL Debug Capability tests, most likely with one (or both) of these methods. As I don’t yet have a USB 3.0 A-A debug cable available (or another machine with USB 3.0 ports for that matter), I set out to use the network transport. Checking the PCI hardware ID I could see that the NIC was reporting itself as a Realtek RTL8168. I checked MSDN’s list of supported NICs and saw that the first model listed was the Realtek RTL8168 – yay! I turned off secure boot (a requirement to modify boot options on UEFI machines), fired up BCDEDIT, configured the machine for network debugging, copied the key over to the host machine, fired up the debugger, opened the necessary ports on the firewall, and rebooted the target machine and …. nothing happened. I tried a few more times, got nothing. Checked my firewall and my BCDEDIT configs, still nothing. I verified that my host machine was setup correctly by firing up a Windows 8 VM with a bridged network connection using the same BCEDIT commands and that was working like a charm. I fired up Wireshark to see if the target machine was even sending out packets to try and connect and it was not. I updated the UEFI firmware and every driver on the system with the latest from Dell and still nothing. I tried setting static DHCP reservations for both machines and using the dhcp=no boot option flag, still nothing. Looking back at wireshark I could see that early in the boot process there were IPv6 solicitation packets going out, but nothing for IPv4 (which is the required protocol for kernel debugging) until much later in the boot sequence. Furthermore, these IPv4 packets were only trying to solicit a private 169.254. address.

    I continued poking around for several hours, trying different things and searching the Internet for clues. It seemed like I had landed in a barren wasteland, with seemingly no chatter about anything that sounded like my issue. Then I stumbled across this post on the MSDN forums that had two responses – one from the PM on the team that owns the WHQL Debug Capability test (Matt Rolak) and one from a software design engineer on the Windows debugger team (Joe Ballantyne). Matt’s answer said that one possible cause of the OP’s question was:

    In some cases machines with 4+GB of RAM don’t work for the NIC debugging (but will if you change the system to less than 4GB). It may be that your board passed their certification test with a different bios or with less RAM and worked around the issue.

    The MSDN article on this is here: http://msdn.microsoft.com/en-gb/library/windows/hardware/hh830880 but the key part is:

    • System firmware should discover and configure the NIC device such that its resources do not conflict with any other devices that have been BIOS-configured.
    • System firmware should place the NIC’s resources under address windows that are not marked prefetchable

    It seems for some MB’s this isn’t adhered to if they have more than 4GB of RAM configured and try to use NIC debugging. The ‘workaround’ was to reduce the RAM during debugging (less than ideal we know but unless the BIOS is updated by the manufacturer there isn’t another workaround we are aware of at this time).

    The machine I was dealing with has 8GB of RAM and Matt’s reply was Jan of this year, so this could possibly be coming into play in my situation. With Dell’s sleek design of the XPS 15 L512x however, removing system memory was more than I felt like dealing with right away. It’s not terrible, but you do have to remove 6 T-5 screws and a couple of philips screws to get the back panel off to access the system memory and I wasn’t ready just yet to start taking the machine apart. Reading on, we see from Joe’s answer (emphasis mine):

    I suspect that you have an unsupported NIC in the target machine, and that KDNET on the target machine is attempting to use USB3 debugging, and that you have a MB that has USB3 hardware support, and which also “supports” USB3 debugging. […]

    What NIC is in the target machine? If it is a new model Realtek 8168 then it is likely NOT supported. (Unfortunately Realtek keeps their PCI device ID unchanged for many different hardware spins of their chips, and so some 8168 chips are supported and some are not.)

    You have got. to. be. kidding me. I double check the PCI device ID for the NIC again and confirm that it is reporting itself as Realtek RTL8168. I go back to the Dell support page to check the driver update details again a notice a key detail – Dell is reporting the Windows 8 network driver update for it’s service tag as a Realtek RTL8111F! I headed over to the Realtek site to see if I could find out specifics about the differences between the RTL8168 and the RTL8111F. The only difference that jumped out to me was the packaging and neither page discussed chipset support with Microsoft’s kernel debug transport. The RTL8111F is definitely not listed as supported on the MSDN list of supported NICs. This would also explain why I wasn’t seeing any IPv4 packets being sent out by the target machine until late in the boot, likely after the Realtek driver had been loaded.

    I found this whole ordeal pretty frustrating. It’s frustrating that Realtek is shipping hardware that is reporting the same PCI device ID that have different hardware interfaces such that the Microsoft driver that works for one, doesn’t work at all for the other. Realtek gets away with it for normal use of the NIC because they use the same driver bundle for all the 8168 and 8111 chipsets (along with several others). It’s also frustrating that, given the fact that Microsoft is obviously internally aware of this headache, that their MSDN supported Ethernet NICs page doesn’t have an asterisk on Realtek chip support stating these issues.

    Conclusion

    At this point, I’m waiting on a USB 3.0 debug cable and a PCI-E USB 3.0 adapter to arrive so I can approach it from that angle. Based on my experience, if you need to kernel debug hardware and you want to utilize the KDNET network transport protocol, I suggest you look towards the Intel and/or Broadcom chipsets if you get to choose. There are far more chipsets listed as supported from those vendors. If you end up with a machine with a Realtek controller, I still can’t tell you a universal way to determine which chipset you actually have and thus if it’s supported, so be prepared for the possibility of failure.