From: Greg Ercolano <erco@(email surpressed)>
Subject: [SysAdmin/Windows] Getting "ERROR #1398 (has no error message)" intermittently
   Date: Thu, 18 Sep 2008 15:22:40 -0400
Msg# 1779
View Complete Thread (2 articles) | All Threads
Last Next
This problem came up from two different companies within the last month,
so I thought I post the issue / solution here.

*** Problem: Company #1 ***
> I'm seeing these errors in the rushd.log intermittently:
>
> 08/21,16:06:38 ERROR      //zserver/rush_logs/CMM/3D/logs/fex_007_scn_031): ERROR #1398 (has no error message)
> 08/21,16:06:46 ERROR      //zserver/rush_logs/CMM/3D/logs/fex_003_scn_013): ERROR #1398 (has no error message)
> 08/21,16:06:46 ERROR      //zserver/rush_logs/CMM/3D/logs/fex_003_scn_013): ERROR #1398 (has no error message)
>
> When it happens, it's with our file server running CIFS in guest mode,
> and use Kerberos authentication. Rush is running as a domain user.
> Sometimes it fixes itself after a reboot.
> 
*** Problem: Company #2 ***
> We're intermittently getting these errors in the 'NOTES' field in irush's
> 'Frames' report.. seems to be complaining about the unc path to our BlueArc
> file server:
>
> //rtserver/projects/2008_US_VEH/3D/scenes/interior_fly/veh_mask.ma.log: ERROR #1398 (has no error message).
>
> Rush is configured to run as a domain user, and Kerberos authentication is used
> with our PDC.

*** Response ***

	Error #1398 is a Microsoft filesystem authentication error number
	that translates to:

		"There is a time and/or date difference between the client and server."

	..which means there's more than a 5 minute(*) drift between the clocks
	on the client and PDC, or client and file server.

	This is not a problem with Rush, but with windows authentication.

	Microsoft's "Troubleshooting Kerberos Errors" page has info wrt time drift:
	http://www.microsoft.com/downloads/details.aspx?FamilyID=7dfeb015-6043-47db-8238-dc7af89c93f1&displaylang=en

	Regarding the 5 minute(*) citation, this tolerance is a default
	that apparently can be changed in the group policy.

	BTW, you can convert Microsoft error numbers like "1398" using 'net helpmsg',
	eg:

		net helpmsg 1398

	..which in this case prints the time/date error as shown above:

		There is a time and/or date difference between the client and server.

	Rush may print "(has no error message)" for error 1398 because that build
	of Rush didn't have that error message at the time it was built. Microsoft
	creates new error messages from time to time, as they 'embrace and extend'
	their way across the computing landscape.

*** Solution ***

	In the case of Company #1, they fixed the problem by syncing their clocks,
	and converting rush to run on their windows machines as a local user. They
	mention they did have to reboot the clients because the mounts would not
	pickup due to the time slip.

	In the case of Company #2, they fixed the clock skew problem on their
	BlueArc file server; the PDC and clients were all properly synchronized.

	I suggested in both cases they switch to having rush run as a local user
	to take away the 24/7 dependency on PDC authentication that domain logins
	impose.

	Domain accounts make things easier for the sysadmin, but make things
	more complicated for the machines. In normal cases such as users manually
	logging in, this isn't hard on the machines.

	But in high load cases, it's best to make things easier for the machines,
	otherwise problems crop up. Rush is heavily exercising the machines constantly,
	so the weaknesses of network authentication get exercised too. You may actually
	save yourself administration tasks by taking the extra labor effort of
	configuring /local/ accounts for rush to run as on each machine, just to
	take away the dependency on PDC authentication for network rendering.

   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: [SysAdmin/Windows] Getting "ERROR #1398 (has no error message)"
   Date: Tue, 14 Apr 2015 21:10:08 -0400
Msg# 2392
View Complete Thread (2 articles) | All Threads
Last Next

On 09/18/08 12:22, Greg Ercolano wrote:
> 	Error #1398 is a Microsoft filesystem authentication error number
> 	that translates to:
>
> 		"There is a time and/or date difference between the client and server."
>
> 	..which means there's more than a 5 minute(*) drift between the clocks
> 	on the client and PDC, or client and file server.
>
> 	This is not a problem with Rush, but with windows authentication.
>
> 	Microsoft's "Troubleshooting Kerberos Errors" page has info wrt time drift:
> 	http://www.microsoft.com/downloads/details.aspx?FamilyID=7dfeb015-6043-47db-8238-dc7af89c93f1&displaylang=en
>
> 	Regarding the 5 minute(*) citation, this tolerance is a default
> 	that apparently can be changed in the group policy.


    Wanted to follow up to this old thread; time synchronization between Windows
    clients and servers can definitely affect authentication during rendering,
    i.e. access to files on the file server during rendering.

    Especially if you have the rushd service run as a domain user, instead of as
    a local "workgroup" user (which avoids Kerberos and the associated issues it
    can cause)

    The above link "Troubleshooting Kerberos Errors" has some good info
    in there, though it seems it's no longer an html page, they've turned it
    into a .DOC file.

    Because URLs on microsoft.com have often gone stale, I'd like to include
    some of the trouble shooting info they recommend WRT to Kerberos
    authentication troubleshooting, as many admins are not aware of just how
    critical time sync between machines is on Windows networks, and how that
    seemingly unrelated issue can affect file access.

    Note that Kerberos authentication is by default enabled in SMB/CIFS
    file access, so its subtle requirements impact file server access.
    (I believe Kerberos authentication can be disabled if you want to avoid
    these issues.. but if you want to 'tune' it, see below and the .doc file
    at the above link in its entirety)

"""
Common Issues

The following sections detail the most common problems encountered by users
in Kerberos authentication environments, explain the possible causes of those
problems, and suggest how to resolve those problems.

Time Synchronization (Clock Skew)

One type of attack that Kerberos authentication was designed to prevent is known
as a “replay” attack. In a replay attack, a malicious user captures the network traffic
and replays it to fool the authenticating server into accepting the attacker as a
legitimate user who is providing credentials.

Kerberos authentication prevents a replay attack with two mechanisms:

    o The Kerberos client on the local computer encrypts a timestamp inside
      the authenticator and then sends it to the KDC. If the KDC verifies that
      the time it decrypts from the authenticator is within a specified amount
      of the local time on the KDC (the default is 5 minutes), the system can
      assume that the credentials presented are genuine.

    o All tickets issued by the KDC have an expiration time. Thus, if a ticket
      is compromised, it cannot be used outside of a specified time range — usually
      short enough to make the risk of a replay attack minimal.

Because of these mechanisms, Kerberos authentication relies on the date and time
that are set on the KDC and the client. If there is too great a time difference
between the KDC and a client requesting tickets, the KDC cannot determine whether
the request is legitimate or a replay. Moreover, if the time difference is so great
that the client is far into the future, the client might attempt to compensate
for the clock skew, but will receive tickets that have already expired and
are useless. If the client requests new tickets, that will not solve the problem
because the KDC uses its own clock as a reference instead of the time on the
client computer.

Therefore, it is vital that the time on all of the computers on a network be synchronized
in order for Kerberos authentication to function properly. This means that all of the
domains and forest in a network must use the same time source. An Active Directory
domain controller will act as an authoritative time server for its domain,
which guarantees that an entire domain will have the same time. However, multiple domains
might not have their times synchronized. It is recommended that you use either an
external time source or a single time source within the network to synchronize all computers.


Problem

The difference between client timestamp in the authenticator or KRB_AS_REQ and the server
is greater than the Maximum tolerance for computer clock synchronization setting in the
domain policy.

Confirmation

Clock skew can be easily diagnosed by reviewing data in Event Viewer.
For more information, see:

    o 0x25: KRB_AP_ERR_SKEW: Clock Skew too great later in this white paper.

    o Clock Skew network trace in Appendix A.

Resolution
For information about how to use an external time source to synchronize
all the computers in a domain, see “How to Configure an Authoritative Time
Server in Windows 2000” in the Microsoft Knowledge Base at
http://go.microsoft.com/fwlink/?LinkId=23042.
"""

	The reference above to "0x25: KRB_AP_ERR_SKEW: Clock Skew too great
        later in this white paper" refers to this:

"""
0x25 - KRB_AP_ERR_SKEW: Clock skew too great

Associated internal Windows error codes

    o STATUS_TIME_DIFFERENCE_AT_DC

Corresponding debug output messages

    o DebugLog(“Client asked for endtime before starttime\n”)

Possible Causes and Resolution

This error is logged if a client computer sends a timestamp whose value
differs from that of the server’s timestamp by more than the number of minutes
found in the *Maximum tolerance for computer clock synchronization* setting
in Kerberos policy.

Although this error might show up in the logs, it will not prevent a user
from being authenticated. When this error is returned, the domain controller
also supplies the correct time on the domain controller. The Kerberos client
uses the correct domain controller time to attempt the authentication request
a second time. Presuming that the user’s credentials are valid, the user will
be authenticated on the second try.

    o This error can more commonly occur as the number of notebooks — that is,
      disconnected computers — in your network increases.

      Beware that the higher you set the value of the Maximum tolerance for
      computer clock synchronization setting, the more susceptible the network
      becomes to replay attacks.

      To set *Maximum tolerance for computer clock synchronization* Kerberos policy:

          1. Open the domain security policy by clicking Start, Programs,
             Administrative Tools, Local Security Policy.

          2. Click Account Policies, and then click Kerberos Policy.

          3. Increase the value for Maximum tolerance for computer clock synchronization.

          4. You can either wait for the policy change to propagate or you can run
             gpupdate /force on the client computers to force propagation immediately.
"""