|
|
#1 | |
|
Registered User
Join Date: Jan 2009
Posts: 12
|
Hi,
I've been breaking my head trying to get Bumblebee to work. After some generous help on #bumblebee IRC channel, I started doing some low-level testing, and managed to narrow down the problem. Distro is Gentoo. Kernel version is 3.2 (I can try with 3.5 or other kernels but I suspect the problem will persist). Driver versions I have tried are 304.22, 304.43 and 304.48. Either doing this: Code:
# modprobe nvidia # nvidia-xconfig -query-gpu-info Code:
# nvidia-xconfig -query-gpu-info Code:
[ 48.990268] nvidia: module license 'NVIDIA' taints kernel. [ 48.990271] Disabling lock debugging due to kernel taint [ 49.030980] nvidia 0000:01:00.0: power state changed by ACPI to D0 [ 49.030984] nvidia 0000:01:00.0: power state changed by ACPI to D0 [ 49.030987] nvidia 0000:01:00.0: enabling device (0004 -> 0007) [ 49.030992] nvidia 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [ 49.030999] nvidia 0000:01:00.0: setting latency timer to 64 [ 49.031003] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=none [ 49.031104] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 304.22 Mon Jul 9 21:07:07 PDT 2012 [ 54.728024] NVRM: GPU at 0000:01:00.0 has fallen off the bus. [ 54.728039] NVRM: os_pci_init_handle: invalid context! [ 54.728041] NVRM: os_pci_init_handle: invalid context! [ 54.728045] NVRM: GPU at 0000:01:00.0 has fallen off the bus. [ 54.728048] NVRM: os_pci_init_handle: invalid context! [ 54.728049] NVRM: os_pci_init_handle: invalid context! [ 54.971061] NVRM: RmInitAdapter failed! (0x26:0xffffffff:1181) [ 54.971071] NVRM: rm_init_adapter(0) failed [ 54.974870] NVRM: RmInitAdapter failed! (0x23:0x2f:675) [ 54.974873] NVRM: rm_init_adapter(0) failed Code:
01:00.0 VGA compatible controller [0300]: nVidia Corporation Device [10de:0ffc] (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel driver in use: nvidia
Any ideas? Thanks in advance Cheers GODLiKE |
|
|
|
|
|
|
#2 | |
|
Registered User
Join Date: Jan 2009
Posts: 12
|
Forgot to attach nvidia-bug-report: http://www.vicarious.com.ar/~godlike...-report.log.gz (have patience, it's my home connection
) |
|
|
|
|
|
|
#3 |
|
Registered User
Join Date: Nov 2008
Posts: 95
|
Don't you need to put 'optirun' in front of any commands you want to run with the nvidia card? For instance, lspci tells you 'unknown header type 7f' because the card is off (ie in lower power state), so if you do 'optirun lspci' you should see more useful information. If you run nvidia-settings, you also need to specify the X display to use, ie "optirun nvidia-settings -c :8".
And you shouldn't need to worry about nvidia-xconfig, just edit the config file that bumblebee is using (eg on Ubuntu it puts this in /etc/bumblebee/xorg.conf.nvidia). |
|
|
|
|
|
#4 | |
|
Registered User
Join Date: Jan 2009
Posts: 12
|
Quote:
Moreover, optirun basically what it does is running whatever it is you put after "optirun" in another X server running on the dedicated GPU, and then drawing the results back to the main display. "optirun lspci" does not make sense in this scenario. |
|
|
|
|
|
|
#5 | |
|
Registered User
Join Date: Nov 2008
Posts: 95
|
Quote:
This is why your lspci command couldn't get any details about the nvidia card, and why "optirun lspci" makes perfect sense. For instance, on my system: Code:
optirun lspci -s 1:00.0 -v Code:
01:00.0 VGA compatible controller: NVIDIA Corporation GF108 [GeForce GT 540M] (rev a1) (prog-if 00 [VGA controller]) Subsystem: Dell Device 050e Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at f0000000 (32-bit, non-prefetchable) [size=16M] Memory at c0000000 (64-bit, prefetchable) [size=256M] Memory at d0000000 (64-bit, prefetchable) [size=32M] I/O ports at 3000 [size=128] [virtual] Expansion ROM at f1000000 [disabled] [size=512K] Capabilities: <access denied> Kernel driver in use: nvidia Kernel modules: nvidia_current, nouveau, nvidiafb Code:
lspci -s 1:00.0 -v Code:
01:00.0 VGA compatible controller: NVIDIA Corporation GF108 [GeForce GT 540M] (rev ff) (prog-if ff) !!! Unknown header type 7f |
|
|
|
|
|
|
#6 | |
|
Registered User
Join Date: Jan 2009
Posts: 12
|
I should have mentioned it before, but lspci errors out only after I get the "fallen off the bus" error (which happens whenever I wish to use the GPU).
Here's what I get after a clean reboot, and nothing loaded (not bbswitch, not nvidia module, no nothing). Also, I can modprobe nvidia and throw an lspci afterwards and the result is the same. I also tried modprobing both nvidia an bbswitch and manually power-cycling the card, which works. Only after doing anything that actually requires use of the card (be it optirun, nvidia-xconfig, nvidia-smi, or a CUDA program), does the GPU fall off the bus and the lspci output is as displayed on my first post. Code:
panther godlike # lspci -d 10de: -vvnn 01:00.0 VGA compatible controller [0300]: nVidia Corporation Device [10de:0ffc] (rev a1) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device [17aa:21f5] Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 11 Region 0: Memory at f0000000 (32-bit, non-prefetchable) [disabled] [size=16M] Region 1: Memory at c0000000 (64-bit, prefetchable) [disabled] [size=256M] Region 3: Memory at d0000000 (64-bit, prefetchable) [disabled] [size=32M] Region 5: I/O ports at 5000 [disabled] [size=128] Expansion ROM at f1000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [78] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Latency L0 <512ns, L1 <4us ClockPM+ Surprise- LLActRep- BwNot- LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range AB, TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest+ Capabilities: [b4] Vendor Specific Information: Len=14 <?> Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [128 v1] Power Budgeting <?> Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900 v1] #19 Kernel modules: nvidia |
|
|
|
|
|
|
#7 |
|
Registered User
Join Date: Jan 2009
Posts: 12
|
I just booted on an Ubuntu 12.04 x64 livecd and can confirm that the GPU is working. At least nvidia-xconfig -query-gpu-info now gives me something.
|
|
|
|
|
|
#8 |
|
Registered User
Join Date: Jan 2009
Posts: 12
|
Fixed it. I was missing these two kernel options:
Code:
CONFIG_NO_HZ:
This option enables a tickless system: timer interrupts will
only trigger on an as-needed basis both when the system is
busy and when the system is idle.
CONFIG_RCU_FAST_NO_HZ:
This option causes RCU to attempt to accelerate grace periods
in order to allow CPUs to enter dynticks-idle state more
quickly. On the other hand, this option increases the overhead
of the dynticks-idle checking, particularly on systems with
large numbers of CPUs.
Anyway, I'm off to sleep. Hope this serves somebody. |
|
|
|
|
|
#9 |
|
Registered User
Join Date: Jan 2009
Posts: 12
|
One more thing: after doing more testing at the request of the Bumblebee guys, I could see that IOMMU kernel configuration has an impact too. Without this option compiled in:
Code:
CONFIG_CALGARY_IOMMU:
Support for hardware IOMMUs in IBM's xSeries x366 and x460
systems. Needed to run systems with more than 3GB of memory
properly with 32-bit PCI devices that do not support DAC
(Double Address Cycle). Calgary also supports bus level
isolation, where all DMAs pass through the IOMMU. This
prevents them from going anywhere except their intended
destination. This catches hard-to-find kernel bugs and
mis-behaving drivers and devices that do not use the DMA-API
properly to set up their DMA buffers. The IOMMU can be
turned off at boot time with the iommu=off parameter.
Normally the kernel will make the right choice by itself.
If unsure, say Y.
|
|
|
|
![]() |
| Thread Tools | |
|
|