From: Greg Ercolano <erco@(email surpressed)> Subject: [SysAdmin/Windows] Getting "ERROR #1398 (has no error message)" intermittently Date: Thu, 18 Sep 2008 15:22:40 -0400 |
Msg# 1779 View Complete Thread (2 articles) | All Threads Last Next |
This problem came up from two different companies within the last month, so I thought I post the issue / solution here. *** Problem: Company #1 *** > I'm seeing these errors in the rushd.log intermittently: > > 08/21,16:06:38 ERROR //zserver/rush_logs/CMM/3D/logs/fex_007_scn_031): ERROR #1398 (has no error message) > 08/21,16:06:46 ERROR //zserver/rush_logs/CMM/3D/logs/fex_003_scn_013): ERROR #1398 (has no error message) > 08/21,16:06:46 ERROR //zserver/rush_logs/CMM/3D/logs/fex_003_scn_013): ERROR #1398 (has no error message) > > When it happens, it's with our file server running CIFS in guest mode, > and use Kerberos authentication. Rush is running as a domain user. > Sometimes it fixes itself after a reboot. > *** Problem: Company #2 *** > We're intermittently getting these errors in the 'NOTES' field in irush's > 'Frames' report.. seems to be complaining about the unc path to our BlueArc > file server: > > //rtserver/projects/2008_US_VEH/3D/scenes/interior_fly/veh_mask.ma.log: ERROR #1398 (has no error message). > > Rush is configured to run as a domain user, and Kerberos authentication is used > with our PDC. *** Response *** Error #1398 is a Microsoft filesystem authentication error number that translates to: "There is a time and/or date difference between the client and server." ..which means there's more than a 5 minute(*) drift between the clocks on the client and PDC, or client and file server. This is not a problem with Rush, but with windows authentication. Microsoft's "Troubleshooting Kerberos Errors" page has info wrt time drift: http://www.microsoft.com/downloads/details.aspx?FamilyID=7dfeb015-6043-47db-8238-dc7af89c93f1&displaylang=en Regarding the 5 minute(*) citation, this tolerance is a default that apparently can be changed in the group policy. BTW, you can convert Microsoft error numbers like "1398" using 'net helpmsg', eg: net helpmsg 1398 ..which in this case prints the time/date error as shown above: There is a time and/or date difference between the client and server. Rush may print "(has no error message)" for error 1398 because that build of Rush didn't have that error message at the time it was built. Microsoft creates new error messages from time to time, as they 'embrace and extend' their way across the computing landscape. *** Solution *** In the case of Company #1, they fixed the problem by syncing their clocks, and converting rush to run on their windows machines as a local user. They mention they did have to reboot the clients because the mounts would not pickup due to the time slip. In the case of Company #2, they fixed the clock skew problem on their BlueArc file server; the PDC and clients were all properly synchronized. I suggested in both cases they switch to having rush run as a local user to take away the 24/7 dependency on PDC authentication that domain logins impose. Domain accounts make things easier for the sysadmin, but make things more complicated for the machines. In normal cases such as users manually logging in, this isn't hard on the machines. But in high load cases, it's best to make things easier for the machines, otherwise problems crop up. Rush is heavily exercising the machines constantly, so the weaknesses of network authentication get exercised too. You may actually save yourself administration tasks by taking the extra labor effort of configuring /local/ accounts for rush to run as on each machine, just to take away the dependency on PDC authentication for network rendering. |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: [SysAdmin/Windows] Getting "ERROR #1398 (has no error message)" Date: Tue, 14 Apr 2015 21:10:08 -0400 |
Msg# 2392 View Complete Thread (2 articles) | All Threads Last Next |
On 09/18/08 12:22, Greg Ercolano wrote: > Error #1398 is a Microsoft filesystem authentication error number > that translates to: > > "There is a time and/or date difference between the client and server." > > ..which means there's more than a 5 minute(*) drift between the clocks > on the client and PDC, or client and file server. > > This is not a problem with Rush, but with windows authentication. > > Microsoft's "Troubleshooting Kerberos Errors" page has info wrt time drift: > http://www.microsoft.com/downloads/details.aspx?FamilyID=7dfeb015-6043-47db-8238-dc7af89c93f1&displaylang=en > > Regarding the 5 minute(*) citation, this tolerance is a default > that apparently can be changed in the group policy. Wanted to follow up to this old thread; time synchronization between Windows clients and servers can definitely affect authentication during rendering, i.e. access to files on the file server during rendering. Especially if you have the rushd service run as a domain user, instead of as a local "workgroup" user (which avoids Kerberos and the associated issues it can cause) The above link "Troubleshooting Kerberos Errors" has some good info in there, though it seems it's no longer an html page, they've turned it into a .DOC file. Because URLs on microsoft.com have often gone stale, I'd like to include some of the trouble shooting info they recommend WRT to Kerberos authentication troubleshooting, as many admins are not aware of just how critical time sync between machines is on Windows networks, and how that seemingly unrelated issue can affect file access. Note that Kerberos authentication is by default enabled in SMB/CIFS file access, so its subtle requirements impact file server access. (I believe Kerberos authentication can be disabled if you want to avoid these issues.. but if you want to 'tune' it, see below and the .doc file at the above link in its entirety) """ Common Issues The following sections detail the most common problems encountered by users in Kerberos authentication environments, explain the possible causes of those problems, and suggest how to resolve those problems. Time Synchronization (Clock Skew) One type of attack that Kerberos authentication was designed to prevent is known as a “replay” attack. In a replay attack, a malicious user captures the network traffic and replays it to fool the authenticating server into accepting the attacker as a legitimate user who is providing credentials. Kerberos authentication prevents a replay attack with two mechanisms: o The Kerberos client on the local computer encrypts a timestamp inside the authenticator and then sends it to the KDC. If the KDC verifies that the time it decrypts from the authenticator is within a specified amount of the local time on the KDC (the default is 5 minutes), the system can assume that the credentials presented are genuine. o All tickets issued by the KDC have an expiration time. Thus, if a ticket is compromised, it cannot be used outside of a specified time range — usually short enough to make the risk of a replay attack minimal. Because of these mechanisms, Kerberos authentication relies on the date and time that are set on the KDC and the client. If there is too great a time difference between the KDC and a client requesting tickets, the KDC cannot determine whether the request is legitimate or a replay. Moreover, if the time difference is so great that the client is far into the future, the client might attempt to compensate for the clock skew, but will receive tickets that have already expired and are useless. If the client requests new tickets, that will not solve the problem because the KDC uses its own clock as a reference instead of the time on the client computer. Therefore, it is vital that the time on all of the computers on a network be synchronized in order for Kerberos authentication to function properly. This means that all of the domains and forest in a network must use the same time source. An Active Directory domain controller will act as an authoritative time server for its domain, which guarantees that an entire domain will have the same time. However, multiple domains might not have their times synchronized. It is recommended that you use either an external time source or a single time source within the network to synchronize all computers. Problem The difference between client timestamp in the authenticator or KRB_AS_REQ and the server is greater than the Maximum tolerance for computer clock synchronization setting in the domain policy. Confirmation Clock skew can be easily diagnosed by reviewing data in Event Viewer. For more information, see: o 0x25: KRB_AP_ERR_SKEW: Clock Skew too great later in this white paper. o Clock Skew network trace in Appendix A. Resolution For information about how to use an external time source to synchronize all the computers in a domain, see “How to Configure an Authoritative Time Server in Windows 2000” in the Microsoft Knowledge Base at http://go.microsoft.com/fwlink/?LinkId=23042. """ The reference above to "0x25: KRB_AP_ERR_SKEW: Clock Skew too great later in this white paper" refers to this: """ 0x25 - KRB_AP_ERR_SKEW: Clock skew too great Associated internal Windows error codes o STATUS_TIME_DIFFERENCE_AT_DC Corresponding debug output messages o DebugLog(“Client asked for endtime before starttime\n”) Possible Causes and Resolution This error is logged if a client computer sends a timestamp whose value differs from that of the server’s timestamp by more than the number of minutes found in the *Maximum tolerance for computer clock synchronization* setting in Kerberos policy. Although this error might show up in the logs, it will not prevent a user from being authenticated. When this error is returned, the domain controller also supplies the correct time on the domain controller. The Kerberos client uses the correct domain controller time to attempt the authentication request a second time. Presuming that the user’s credentials are valid, the user will be authenticated on the second try. o This error can more commonly occur as the number of notebooks — that is, disconnected computers — in your network increases. Beware that the higher you set the value of the Maximum tolerance for computer clock synchronization setting, the more susceptible the network becomes to replay attacks. To set *Maximum tolerance for computer clock synchronization* Kerberos policy: 1. Open the domain security policy by clicking Start, Programs, Administrative Tools, Local Security Policy. 2. Click Account Policies, and then click Kerberos Policy. 3. Increase the value for Maximum tolerance for computer clock synchronization. 4. You can either wait for the policy change to propagate or you can run gpupdate /force on the client computers to force propagation immediately. """ |