This document contains general information and troubleshooting helps. For more specific information, search for other technical information documents or contact Novell Technical Support.
NetWare Symmetric MultiProcessing (SMP) software allows multiprocessing-enabled NetWare Loadable Modules to run on a multiprocessor computer and take advantage of the increased processing power multiprocessing provides. SMP brings a new level of performance to NetWare, but it also brings a new level of complexity. Troubleshooting problems in an SMP environment is always more difficult than in a native NetWare environment. The following information has been compiled to help clarify the process of troubleshooting problems on a NetWare SMP server.
IT IS MUCH EASIER TO TROUBLESHOOT PROBLEMS IN NATIVE NETWARE THAN IT IS IN NETWARE SMP. Unless the problems have a known solution, it is usually best to go back to native NetWare to troubleshoot. Most of the problems that occur on an SMP server will also occur when the server is taken back to the native kernel. It is much easier to determine the cause of such problems without having to work with the added complexity of SMP. If the problems do not occur under the native kernel, that information is also very useful in determining the cause. Particularly with abends, hangs, and LAN communication problems, it is very important to know if the problems are independent of the SMP modules.
SMP AND VERSIONS OF NETWARE:
NetWare 4.11 SMP is a component of intraNetWare. The following information applies to NetWare 4.11 SMP. All of the following information also applies to intraNetWare for Small Business with a few minor exceptions as noted.
If you are running NetWare 4.10 SMP, Novell's policy on supporting this product is as follows:
NetWare 4.10 SMP was an OEM-only product. NetWare 4.10 SMP is no longer supported through technical support channels at Novell. If the customer does not upgrade to version 4.11 then the customer must contact the OEM partner from whom he or she purchased the SMP package.
SMP AND FILE SYSTEM PERFORMANCE:
NetWare 4.11 SMP provides no improvement for file system performance (unless the server is running a database management system with a proprietary file system interface). With the current release of NetWare SMP, all file system access must be handled by processor 0. On a server where the disk channel is operating near maximum capacity, running the server with SMP will aggravate the problem and could cause hangs or abends.
SMP AND LAN PERFORMANCE:
Improved LAN performance is one of the major advantages of multiprocessing on a NetWare server. The LAN support modules in NetWare 4.11 SMP are multiprocessing enabled and will take advantage of any additional active processors. On a server where heavy LAN traffic is causing slowness or utilization problems, SMP can provide a very noticeable performance improvement.
SMP INSTALLATION QUESTIONS AND PROBLEMS:
During a full install of intraNetWare, INSTALL.NLM calls MPDETECT.NLM which checks your system for additional processors. If MPDETECT.NLM finds more than one processor, you will be given the option to install SMP. MPdetect will automatically highlight what it considers the most appropriate PSM for your hardware. (intraNetWare for Small Business does not give the option to install SMP during an initial install, but SMP can be added later in exactly the same way it can be added to intraNetWare.)
New with intraNetWare, INSTALL.NLM now contains the menu item "Multi CPU options." This option will allow you to install or uninstall SMP. See the NetWare 4.11 manual Supervising the Network, Chapter 8 for more detailed instructions. SMP support can also be enabled or disabled manually.
To manually enable/disable SMP:
To manually enable SMP, add the following three lines to the STARTUP.NCF file:
load mps14.psm ;or whichever PSM is appropriate for your hardware
load smp.nlm
load mpdriver all
With SMP it is also recommended that you add "set upgrade low priority threads=on" to the autoexec.ncf file. This will prevent possible problems with low priority threads such as compression and suballocation not getting enough processor time to perform their necessary functions. If you are troubleshooting high utilization or not using SMP, turn low priority threads off. Be careful with this parameter as it can cause high utilization issues in versions of NetWare other than SMP.
After these lines are added, restart the server. Make sure to use the correct Platform Support Module for your hardware. If you are unsure of which PSM module to use, then use the automated install or contact your hardware vendor. It is best to add these three lines to the beginning of the STARTUP.NCF file immediately following the patch load lines. Loading any other NLMs before loading the SMP modules is not recommended and will likely result in problems.
To disable SMP, simply comment out the three lines in the STARTUP.NCF file that enable SMP then restart the server.
Troubleshooting installation problems:
As MPDRIVER loads, it will give a message as it activates each additional processor. If you do not see the message "Processor 1 activated..." or if the MONITOR.NLM General Information Screen does not show the correct number of Active Processors, then there are problems with SMP.
First check to be sure you are using the correct PSM. Also check to make sure the BIOS is configured correctly (see General Hardware Issues below).
Try bringing up the server without the startup files (server -ns) then manually load the SMP modules one at a time to see if there are any errors or informational messages. If an error message is given, there may be some information about the error in the section of this document on SMP error messages. Use LOAD MPDRIVER 1 (instead of LOAD MPDRIVER ALL) to explicitly activate processor 1 (processor 0 is the first processor).
SMP LICENSING:
With a NetWare 4.x SMP server, there are two types of licenses. The first is the regular connection license (5, 10, 25, 50, 100, 250, 500, 1000). Second, is the SMP Processor license. The base NetWare license allows up to 4 processors. If you want NetWare to use more than 4 processors, an SMP Processor License must be purchased from Novell. Each SMP Processor License will allow an additional 4 processors. (A server with less than 5 processors will need only the connection license and a server with 8 processors will need only a connection license and one SMP Processor License.)
GENERAL HARDWARE ISSUES:
There are several things that are important to verify about your hardware before adding the SMP support modules to your NetWare server:
1. Make sure you have a compatible Platform Support Module(PSM). Most servers will use the MPS14.PSM file that ships with NetWare 4.11. The major exception to this is Compaq servers which will use the CPQSMP.PSM module. The MPDETECT.NLM (automatically called by INSTALL.NLM when the SMP option in install is chosen) will usually select the correct PSM.
Other machines with proprietary PSM modules include CBUS_II.PSM For Corollary, Inc. C-bus II systems, NFPSM.PSM for NetFrame MP systems, and TRI_SMP.PSM for Tricord MP systems.
In order to use the MPS14.PSM file, the hardware must be STRICTLY compliant with the Intel MP 1.4 specification. Any deviation from this specification can cause problems when using the MPS14.PSM module.
2. The BIOS must also comply with the specifications. Because the MultiProcessor table is created by the BIOS, it is critical that the BIOS comply completely with the Intel MP specification. Version 1.4 of the Intel MP specificaton extended the configuration table. The revions were made 7/1/95. If the BIOS is not at least several months newer than this, then it is guranteed not to be completely compatible with the specification. If the motherboard, and especially the BIOS are not 100% compatible with the specification, then NetWare SMP will not operate properly.
The BIOS on most MP machines has an option to enable or disable additional processors. In order for SMP to work, the additional processors must be enabled. Compaq machines require that the operating system be specified in the configuration. NetWare SMP must be selected in the configuration for the Compaq machine to work properly with NetWare SMP. If you are having trouble getting NetWare SMP to recognize additional processors, often a newer BIOS or configuration utility will help to resolve the problem.
3. PCI issues: The PCI specification has recently been rewritten. Apparently a number of issues came before the PCI spec committee as a result of Novell's SMP project. Therefore, it would seem that all hardware designed and manufactured before the spec change could fall into a class of components and systems with compatibility problems. The experience from Novell Labs and Novell Services is that generally speaking compatibility problems exist. Novell does not recommend you run NetWare SMP on a machine unless the PCI version is at least 2.1.
For more information on PCI troubleshooting see Novell Technical Information Document "PCI Troubleshooting Tips."
4. If your hardware has a secondary I/O APIC (Advanced Programmable Interrupt Controller), NetWare SMP will use this for increased performance and scalability in Symmetric I/O mode if you are using MPS14.PSM. If you are using another platform support module, check with your hardware vendor for more information on secondary I/O APIC support. Usually the secondary I/O APIC is disabled by default and must be enabled using the machine's configuration utility.
LAN COMMUNICATION PROBLEMS:
NetWare 4.11 SMP requires an ODI 3.3 LAN driver and will not work properly with a driver written to the ODI 3.2 specification. NetWare 4.11 SMP will actually perform a check on the ODI version of the adapter driver and LAN support modules (the .LAN and the ETHERTSM.NLM and the MSM.NLM) and will give an error if any of these modules are written to an old ODI specification. (Compaq Servers that contain built-in network adapters should not use the Novell ETHERTSM.NLM and MSM.NLM but need to contact Compaq for the latest version of their NetWare Programs for Compaq.) SMP will not operate correctly if the ODI spec. is not 3.3 or later. Native NetWare 4.11 will usually work with the same versions of the ODI specification that worked with NetWare 4.10. However, only LAN drivers written to the ODI 3.3 specification have been certified by Novell for use with NetWare 4.11. Verify that a certified NetWare 4.11 LAN driver for your hardware exists before installing NetWare 4.11 SMP.
Possible symptoms of a incompatible LAN board or driver include: No LAN communication at all, periodic loss of connections , workstations sometimes not able to log back in until SMP is removed, and various communication problems with the other servers.
If you are having problems with SMP and LAN communications, the problems could be in the LAN card, the LAN driver, in the system board, or in the BIOS. Check the Novell Labs web site (labs.novell.com) to see if your hardware and drivers have been tested by Novell for use with NetWare 4.11 SMP. If Novell Labs had tested your specific configuration we should be able to help. If not, contact your hardware vendor.
ARCNET does not work with NetWare 4.11 SMP:
The following is an excerpt from the intraNetWare ARCNET Position Statement:
"Novell has made the decision to discontinue future development and enhancements to the ARCNET protocol in intraNetWare's ODI. This decision is based on market research and analyst forecasts with regards to the ARCNET protocol....
"The 32-bit LAN drivers in the current shipping version of intraNetWare were written and tested to the ODI 3.3
Specification. The 32-bit ARCNET drivers currently shipping in intraNetWare, RXNETTSM.NLM and TRXNET.LAN, were written to the ODI 3.2 Specification and have not been fully tested in the intraNetWare environment."
Again, NetWare 4.11 SMP requires an ODI 3.3 LAN driver and will not work properly with a driver written to the ODI 3.2 specification. ARCnet will not work with NetWare 4.11 SMP.
SMP-SPECIFIC ERRORS:
"Server 411-1553 Fatal: Processor 1 did not activate."
This error is given just after MPDRIVER is loaded. See the suggestions above under hardware and installation for information about troubleshooting this error.
"Thread fault signal invokation" (error code 1593 or 1600).
This error is not uncommonly followed by an abend: "Free detected modified memory beyond the end of cell ...". There are several possible causes of this error. This error occurs if a fault is detected by the SMP kernel while certain flags are set. These flags can be set during any of the following: calling an NLM initialization or exit routine, executing an alternate console command handler, doing event call backs, during AES Events, and during some work to dos. This error message does not show the true cause of the fault. When debugging this problem it is better to change the SMP setable parameters: Set SMP Developer Option = ON and Set SMP Intrusive abend mode = OFF.
This message may occur if the server is restarted after an abend without cold booting the machine. Contact Novell Technical Support for help in troubleshooting this problem.
"Cannot Relinquish Control"
There is likely an NLM that is conflicting with loading of the SMP modules. Make sure the SMP modules are loading before all other NLMs except the patches. If there is still a problem, take the system back to the bare minimums and then try to load the SMP modules. At this point the SMP modules should load without errors. Try loading the additive or third party NLMs one at a time to identify the conflicting NLM.
"SMP WARNING: spurious hardware interrupt"
The server will display a message something like "SMP WARNING: 100000 spurious hardware interrupt(s) detected on INT 47." Could also be "lost hardware interrupt(s)." It is not unusual to see these messages and it does not always indicate a problem. These messages are turned off by default until they exceed a certain limit. However, if you see these messages frequently, it probably indicates a hardware configuration problem. These messages can be completely disabled, but it doesn't fix the problem -- it just masks any problem that may exist. Make sure the hardware has the latest BIOS revision from the hardware manufacturer. Check hardware configuration settings to make sure they are correct. (The higher interrupt numbers -- INT 47 in the above message -- indicate the use of a software interrupt rather than a hardware interrupt.)
"Processing Don't Marshall Information"
This is an informational message not an error message. It simply indicates that the NLM loading is SMP-enabled. This means that the NLM can safely be run on any processor.
"Divide by zero"
If the server is getting "divide by zero" errors under SMP and the server is using the MPS14.PSM platform support module, using the -fb parameter "load mps14.psm -fb" may help to resolve the problem.
"Cache memory allocator out of available memory"
"Cache memory allocator exceeded the minimum cache buffer limit"
"Short term memory allocator is out of memory"
"X attempts to get more memory failed"
Or possibly other memory-related errors:
After applying the intraNetWare Support Pack v2 (IWSP2.exe) or LIBUPC.exe, 4.11 SMP server begins to lose memory. A condition has been discovered with CLIB that will cause a memory leak on 4.11 SMP servers only. The speed of the memory loss depends on how many SMP threads are running. The problem only occurs with versions of CLIB dated 2/14/97 (from IWSP2) or 3/11/97 (from LIBUPC).
The fix for this problem is IWSP5B.EXE, or the latest support pack. Contact Novell Technical Support for the specific filename.
ABENDS IN SMP:
NetWare SMP requires an enhanced debugger to handle the additional features of SMP. The abend message screens also appear different. The SMP abend message almost always begins with something similar to: "Critical: a severe OS error has occurred. Trusted ring 0. Application may have corrupted memory. The NetWare OS image has been halted. This exception precludes NetWare from continuing." This message is then followed by the specific abend message.
SMP and Abend Recovery:
When running a server with SMP, it is not recommended that you allow the server to automatically recover from software abends. This is because under SMP you run a much greater risk of data corruption after an abend if the server is allowed to continue operation than you would under native NetWare. Although it is not recommended, you may set the server to recover after abends using the settable parameter SMP Intrusive Abend. The values for this parameter are either ON or OFF. The default is ON. When this parameter is set to ON, the server starts the SMP debugger whenever the system abends. When this parameter is set to OFF, the server does not start the debugger. Instead, it halts the offending thread and continues to run. However, if the system abends during an interrupt, the server starts the debugger even if the parameter is set to OFF. The auto restart after abend feature of NetWare 4.11 does not work under SMP.
Troubleshooting abends in SMP:
For the most part, the process of troubleshooting abends in SMP is the same as for native NetWare. It is, however, more difficult to isolate problems in SMP.
Simply put, an abend occurs when either the processor or the software discovers invalid information in memory. The information in memory can be made invalid by either hardware or software problems. In SMP, the memory architecture is more complex than with a single processor. This makes it more difficult to determine the cause of abends and more difficult to extract information from memory images (coredumps).
Before troubleshooting a problem with SMP, FIRST ATTEMPT TO REPRODUCE THE PROBLEM WITHOUT THE SMP MODULES LOADED. If the problem can be reproduced under the native single-processor kernel this greatly simplifies the problem by eliminating the SMP kernel as a potential cause. This will bring the issue to resolution much more quickly.
MONITORING SMP PERFORMANCE:
See the NetWare 4.11 manual Supervising the Network, Chapter 8 for additional information on monitoring SMP performance.
In NetWare SMP, the CPU load is not split equally across all processors. Processor 0 is treated differently from all other processors in NetWare SMP. Only threads that have been compiled as SMP safe can be offloaded to additional processors. All other threads must run on processor 0. NLMs that are SMP enabled can have threads running on different processors. The processing load will be approximately balanced across all processors after processor 0.
With the current release of intraNetWare, the LAN support modules, CLIB modules, and Novell Web Server 2.5 are all SMP enabled. All other modules shipped with intraNetWare must still run on processor 0. GroupWise 5 is also SMP safe.
On some servers, monitor will show processor utilization at 0% all of the time. This problem is not specific to SMP but seems to be more likely to occur on SMP server. All indications are that it is purely a cosmetic problem. A fix to the problem has not yet been implemented.
SMP SET PARAMETERS:
See the NetWare 4.11 manual Supervising the Network, Chapter 8 for information on SMP-specific set parameters. Change SMP set parameters with caution and only when there is a specific reason for doing so.
SMP AND 3RD-PARTY NLMs:
Most NLMs which will run under NetWare 4.11 will not have any problems under SMP. However, if there are any problems under native NetWare, SMP will usually have even more problems. It is recommended that a check with the software vendor for compatibility with NetWare SMP be made before any 3rd-party NLM is loaded on an SMP server. Certification information may also be available from http://labs.novell.com if the product has been certified with Novell Labs.
SMP AND SFT III:
"SFT III is not compatible with NetWare SMP. SFT III is an Asymmetric Multiprocessing solution (ASMP), thus if the scalability of NetWare SMP is required, SFT III is not the proper fault tolerant solution."
Dual Processor SFT III and SMP:
An excerpt from SFT III , SMP, and Dual Processing (TID 2917398):
"SMP and SFT III are NOT compatible and never will be compatible. They cannot be used together.
However, SFT III can be set up to use two processors. The dual processor implements asymmetrical multiprocessing with SFT III. The dual processing support for SFT III allows the ioengine to use one processor and the msengine to use the other processor. If the server has more than two processors, then the first two processors are used. Any additional processors beyond the first two in the machine will be ignored.".
|