From: Greg Ercolano <erco@(email surpressed)> Subject: [RUSH 103] How do I use the new centralized accounting feature in Date: Fri, 01 Feb 2013 18:13:52 -0500 |
Msg# 2280 View Complete Thread (1 article) | All Threads Last Next |
> We see there's the new 'cpuacct.dbasedir' command that lets us > configure rush to write the accounting data to a file server. > Can you provide some details on how this works? In short, yes, the 103 version allows you to specify the path to your file server for cpu accounting data to be accumulated using this rush.conf command: cpuacct.dbasedir "//your/server/rush-accounting" 0755 0644 ----------------------------- ---- ---- | | | | | The permissions for created files | | | The permissions for created sub-directories | The path to the file server top level directory into which the accounting data is written. So whenever accounting data is generated from a render node, using the above example, the accounting data would be written as ascii data to a date oriented directory tree of the form: //your/server/rush-accounting/YYYY/MM/DD/HOSTNAME-cpu.acct ----------------------------- ---- -- -- -------- | | | | | The path specified to | | | Hostname that generated the "cpuacct.dbasedir" | | | accounting data, ie. render node name | | | | | Day of month (padded to 2 digits) | Month of year (padded to 2 digits) 4 digit year So if today is 12/31/2012, and the render node generating the data is 'tahoe', then the data would be appended to the file: //your/server/rush-accounting/2012/12/31/tahoe-cpu.acct Each line in the file would be in the usual rush cpu.acct file format: http://www.seriss.com/rush.103.00/rush/rush-cpu-acct.html The data is swept at regular intervals to the server; the default is 5 minute intervals, but the value can be adjusted with the rush.conf 'cpuacct.dbasedir.sweepsecs' command. (Large networks may need a slower sweep time to prevent load) Included with Rush 103 are python modules that let you load this data for doing computations for cpu utilization, project utilization, etc. These are in rush/examples/python/lib/RushAcct*.py, and include examples and docs that show how to use them. There will be some example python web scripts that generate html graphs based on this data, with features added in subsequent releases. What follows is a comparison of the old rush accounting data management technique for 102.xx vs the newer 103.xx technique: OLD RUSH ACCOUNTING (RUSH 102) ------------------------------ You're probably familiar with the rush/var/cpu.acct files that rush has always written locally on each render node. This accounting file's data format is documented here, and hasn't changed much over the last 13 years: http://www.seriss.com/rush.103.00/rush/rush-cpu-acct.html To collect this data, the design was a single machine could have a crontab that periodically reached out to all the render nodes to collect that data with something like the following: # ROTATE THE ACCOUNTING LOG ON ALL THE NODES rush -rotate rush.acct +any # COLLECT THE ACCOUNTING DATA FROM ALL THE NODES rush -catlog rush.acct +any >> today.log ...and then take the resulting data, and merge it into a database or archive for later processing. The reason it wrote to the local disks of each node first, instead of writing directly to a file server was to prevent the daemons having to ever touch a file server during normal operation, since even a short server outage could hang the daemon up during a file access. At that time Rush was designed, the operating system concept of 'threads' did not exist. (If they had, the rush daemons could use a thread to write the data to prevent hanging up the daemon's main thread) So at that time, better to have the daemons writing to local drives, and have a crontab on the server "pull" the data and write to the archive. NEW RUSH ACCOUNTING (RUSH 103) ------------------------------ The old 102 technique is still available and is the default behavior in 103, but if you modify the cpuacct.dbase entry in the rush.conf file, you can automate the centralization of the cpu accounting data, and you can even disable the local cpu.acct files if you don't want that data to accumulate locally. The new 103 technique handles caching the data locally, and then periodically sweeping the data out to the server, creating a date oriented hierarchy automatically. The dameons use threads to prevent server outages and NFS hangs from causing problems with the daemon, and ensure data doesn't get lost during an outage. And now that the centralized data fits a date oriented directory hierarchy, rush can provide tools in the form of python modules to access this data for reporting purposes, and includes web examples that make use of this data which the user can either customize, or use for reference. These library modules for loading cpu accounting data are in rush/examples/python/lib/RushAcct* and example web script reports will probably be in: rush/examples/python/reports |