Problems with SORA PHY Caches

# Problems with SORA PHY Caches

• Freitag, 19. November 2010 22:50

Hi there,

Currently experiencing a problem that, left untreated, causes a BSOD due to accessing invalid memory locations;

In functions like SdrPhyInitAckCache() [our group has three caches, RTS, CTS, ACK, though two of them only hold one packet at a time (RTS/CTS), just using the cache design in order to make use of SORA_HW_FAST_TX], the function:

 pCtsCacheMan->pCtsModulateBuffer
= MmAllocateContiguousMemorySpecifyCache(
MaxCtsSize ,
MmNonCached
);
if
(pCtsCacheMan->pCtsModulateBuffer == NULL)
{
hr = E_NOT_ENOUGH_RESOURCE;;
break
;
}


is currently always returning E_NOT_ENOUGH_RESOURCE, meaning pCtsCacheMan->pCtsModulateBuffer always equals NULL, leading to that eventual bluescreen. (Memdump+WinDBG+proper symbols says specific cause is BB11BPMDSpreadFIR4SSE() called by SdrPhyModulateCTS(), but the real culprit, I believe, is that the pCtsModulateBuffer being passed is equal to NULL.)

Should be noted that RTS works fine, and we haven't progressed far enough to check ACK. Should also be noted that all three caches are using the ACK code as a base, so all three have:

 PHYSICAL_ADDRESS PhysicalAddress = {0, 0};



in their SdrPhyInit[Type]Cache() code. Could this be part of the problem, a memory overlap?

Also, both RTS/CTS have:

#define MAX_PHY_[TYPE]_SIZE 16 * 1024 * sizeof
(COMPLEX8)
#define MAX_PHY_[TYPE]_NUM 1
#define MAX_[TYPE]_MAKE_REQ_NUM 1


Trying to change MAX_PHY_[TYPE]_SIZE from 16 to any other value causes compiler errors. (Cache is needlessly big?)

error C2118: negative subscript

Any tips on how to proceed/what is causing allocation of the contiguous memory to fail?

If I can get MmAllocateContiguousMemorySpecifyCache to consistantly work for every cache, I can write a refresh cache function that will work better than the current design we have: calling SdrPhyCleanupCtsCache & SdrPhyInitCtsCache for every new CTS packet seems needless.

-Cory

### Alle Antworten

• Montag, 22. November 2010 06:35
Besitzer

Hi Cory,

Contiguous Memory is considered as a limited resource in kernel mode. So it is suggested to use them as little as possible.

Looking from code, I guess you have created different cache for each frame type. I would suggest you to consolidate these caches and to share one cache among all these control frames.

I recall in the sample ACK cache has 16 entries. And you can share these entries with other control frames as well as the modulation buffer.

You can differentiate the different frame types with keys.

Thanks,

- Kun

• Samstag, 18. Dezember 2010 23:03

Kun,

I took your recommendation into consideration, and implemented a version of the ACK cache that uses various keys to differentiate between packets. They do work, but not consistently, I'd say maybe half the time. The other half of the time, I get a blue screen about DRIVER_IRQL_NOT_LESS_OR_EQUAL. WinDBG analysis:

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: fffffff4, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: b314859d, address which referenced memory

Debugging Details:
------------------

CURRENT_IRQL: 2

FAULTING_IP:
SDRMiniport!__QueryPhyFrame+5d [c:\projects\sora\kernel\core\src\_phy_frame_cache.c @ 44]
b314859d 8b02      mov   eax,dword ptr [edx]

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xD1

PROCESS_NAME: System

TRAP_FRAME: b379abf0 -- (.trap 0xffffffffb379abf0)
ErrCode = 00000000
eax=89bcf470 ebx=00000000 ecx=fffffff4 edx=fffffff4 esi=89bce036 edi=881df00a
eip=b314859d esp=b379ac64 ebp=b379ac80 iopl=0     nv up ei ng nz ac po cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000       efl=00010293
SDRMiniport!__QueryPhyFrame+0x5d:
b314859d 8b02      mov   eax,dword ptr [edx] ds:0023:fffffff4=????????
Resetting default scope

LAST_CONTROL_TRANSFER: from b314859d to 80544728

STACK_TEXT:
b379abf0 b314859d badb0d00 fffffff4 00000000 nt!KiTrap0E+0x238
b379ac80 b314852a 89bcf460 00f25002 00000300 SDRMiniport!__QueryPhyFrame+0x5d [c:\projects\sora\kernel\core\src\_phy_frame_cache.c @ 44]
b379ac98 b3144800 89bcf3f0 00f25002 00000300 SDRMiniport!SoraGetPhyFrameInCache+0x1a [c:\projects\sora\kernel\core\src\_phy_frame_cache.c @ 131]
b379acc4 b3144776 89bcf3f0 8a466210 00f25002 SDRMiniport!__Ack+0x30 [c:\sorasdk\src\driver\mac\sdr_mac_rx.c @ 43]
b379ace4 b314407d 89bce000 8a466210 881df000 SDRMiniport!__DataFrameACK+0x76 [c:\sorasdk\src\driver\mac\sdr_mac_rx.c @ 241]
b379ad58 b3141bc0 89bce080 00000000 89bce084 SDRMiniport!SdrMacRx+0x18d [c:\sorasdk\src\driver\mac\sdr_mac_rx.c @ 425]
b379adac 805cff62 89bce080 00000000 00000000 SDRMiniport!__SORA_FSM_ENGINE+0xa0 [c:\projects\sora\kernel\core\src\sora.c @ 66]
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

STACK_COMMAND: kb

FOLLOWUP_IP:
SDRMiniport!__QueryPhyFrame+5d [c:\projects\sora\kernel\core\src\_phy_frame_cache.c @ 44]
b314859d 8b02      mov   eax,dword ptr [edx]

SYMBOL_STACK_INDEX: 1

SYMBOL_NAME: SDRMiniport!__QueryPhyFrame+5d

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: SDRMiniport

IMAGE_NAME: SDRMiniport.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4d0d33a1

FAILURE_BUCKET_ID: 0xD1_SDRMiniport!__QueryPhyFrame+5d

BUCKET_ID: 0xD1_SDRMiniport!__QueryPhyFrame+5d

Followup: MachineOwner
---------



Due to the fact that the process DOES work on occasion, I am forced to believe that the problem is not an actual IRQL level, but instead an attempt to access an invalid/blank address. The above blue-screen also occurs when attempting to transmit out CTS's (a block of code pretty much identical to ACK).

Is there any way I can do step-through debugging/local variable tracing during the RX state of Sora's FSM, to narrow down where this invalid address is? It seems that the kernel mode prints too slowly, and thus there is a lot of [MAC_CS]LAG going on, resulting in no mentioning of any received packets, of any type, in the host computer's WinDBG session.

Also, due to this method, the 16 slots available in the cache fill very quickly (CTS's constantly differ due to the protocol we are working on, while ACK's and RTS's usually stay the same and thus only take up 1 slot, and are constantly reused); do you have any recommendation with regards to how to refresh the cache quickly?

Thanks,

-Cory

• Montag, 20. Dezember 2010 01:55
Moderator

Hi, Cory:

Would you mind sending me the SDRMiniport.pdb and the crash dump file? senxiang at microsoft dot com

And could you try to detach all of your CTS RTS caches, then add them incrementally to see which module really affects? Thanks.

• Donnerstag, 3. März 2011 01:14

Hello,Cory and Tan,

Now I am writing the RTS/CTS code, and I have some questions:

1) If I define one contiguous Memory and let all the control frames(RTS,CTS,ACK) share this contiguous memory, how can I distinguish these different control frames?

2) I hav known that the ACK cache in the sample provides 16 entries, but how to share these entries with different control frames? Can u give some advice?

thanks a lot!

• Donnerstag, 3. März 2011 01:28

Albert;

Here's a copy of my latest able-to-be-compiled driver. Warning: there are a LOT of comments and notes, it's pretty messy.

http://cid-0c55f9bb8697aa22.office.live.com/self.aspx/.Documents/SoraSDK%2003-02-11.zip

If you look in my \src\phy ACK code, you can see my implementation of RTS/CTS/ACK's single cache, and if you look at the \inc\dot11_pkt.h header, you can see exactly where and how I distinguish what those frames consist of. \src\mac is where I am currently forced to do all my modulation. Warning: this is specific to my University's purposes and thus strays from the original 802.11b specifications; that said, a lot of things still match up -- we tested RTS/CTS against a MadWifi+Atheros Ubuntu Linux laptop and it worked, then we began modifying it for our own purposes.

We share our entries as-needed; and upon the completion of a successful transmission/reception (specifically, right after an ACK is received/sent), the cache is wiped and re-initialized, in order to be sure we don't run out of room. This does raise some minor questions with regards to additional latency in a network with a lot of activity.

-Cory

• Donnerstag, 3. März 2011 04:46

Hello, Cory,

Thanks for your kindly share. I will refer to your driver and fix my current problems :)

Best wishes.

• Donnerstag, 10. März 2011 02:20

Hello,Cory,

I want to ask you another question. When two peers have already shaked hands(RTS/CTS), then the channel should be used for only this two peers. But now if there is another peer wants to

access this channel,  maybe the origin two peers be disturbed? And how can I deal with this problem?

Thanks a lot.

• Sonntag, 20. März 2011 21:27

If I understand correctly, you're asking if another node will interfere with RTS/CTS once the handshake occurs -- if the node was present when the RTS/CTS was sent out, no, it shouldn't -- in fact, that's exactly what the "Timer" field in the RTS/CTS packet is for -- setting a CHANNEL RESERVATION TIMER that ties all other nodes not involved in the transmission down so they can't interfere.

If the node was not present when the RTS/CTS occured (for example, I want to send you an email on my network and as I send that email, a wireless-capable device joins the network and starts to ping me), then I'm really not sure what happens -- I didn't program anything in particular to predict for unwelcome visitors in a network during transmission.

Hope that helps.