Installing a new CPR node

This is a fairly easy process. Currently cpradmin should only be run for campus nodes because old nodes on the other meshes have not been updated to use the most recent setup (other meshes still use the dev user instead of the cpr user, among other things.) cprsetup is still good to use on any machines (including those that won't be part of a CPR mesh such as servers) to set up accounts.
  1. First set up the machine. Boot it up with a RHEL4 cd on the NS subnet and give it the following boot command:
    linux ks=
  2. Once the machine has RHEL4 on it, edit /etc/sysconfig/network and /etc/sysconfig/network-scripts/ifcfg-eth0 appropriately
  3. Install the box in it's new home (and make sure DNS works and that you can ssh to it from cpr-central)
  4. if this is _not_ a new hostname, make sure to remove all entries from known_hosts for the nagios user on cpr-southcentral and the cpr user on cpr-central
  5. scp ~cpr/admin/ to the new host (username: root, and the password from the kickstart)
  6. on the host, as root, run cprsetup. The first option is the name of the network (campus, gammon, home, etc) and is required. The second option is optional and only needed if the name the box should report in as is different than the box's hostname. (ie: the hostname is but it should report in as
    ./ campus
  7. At the end of the script, you will be prompted for several steps to do on cpr-central. The instructions below are a little more detailed so use them instead.
  8. Add the printed ssh key to authorized_keys for the cpr and cpr-data users on beaker.
  9. On cpr-southcentral, make sure that as the nagios user, you can ssh to the new cpr host with the nagios user. Resolve any issues this causes until the ssh works with no warnings and the ssh key. You may need to delete an entry or two from nagios's know_hosts file and accept the new host entry when sshing.
  10. On cpr-central, make sure that as the cpr user, you can ssh to the new cpr host with the cpr user. Resolve any issues this causes until the ssh works with no warnings and the ssh key. You may need to delete an entry or two from cpr's know_hosts file and accept the new host entry when sshing.
  11. Once the cpr user can ssh properly, run the cpradmin command(s) that cprsetup gave you. Make sure to run them from the ~cpr/admin/ directory or it will not work!

Manage the CPR nodes

cpradmin provides functionality to do almost any administrative task to one or all of the cpr nodes in a particular monitoring mesh. For several tasks, you'll need to add configuration data into MySQL directly, but cpradmin will take care of generating configuration files and updating all of the machines. Some examples are below (which all use the campus monitoring mesh and campus database on cpr-central): Most other things you can do with cpradmin follow the examples given above. Multithreading is not heavily tested but should work for most things. There are several folders in ~cpr/admin/ that contain files for different pieces of functionality: As shown above, cpradmin is fairly versatile. Below are list of some of the options and a few gotchas:
cd ~cpr/admin/; ./ -n network [-h host] -a [push|get|configure|new] -c command [-f]

Building a map of the current network topology

The topobuilder script is still not finished but it generally works for the most part. First of all, a file for each router that lists it's interfaces needs to be in the ~cpr/topology/ folder. The north interconnect router would have the filename and the contents of the file should be a \n separated list of ip addresses on that router. It is fine for the file to be the output of "sho ip int brief" on the router because topobuilder only looks for IP addresses in the file and as long as there are 0 or 1 IP addresses on each line, everything works out. To run topobuilder, pick a network to run on. There are other options for level (later on this will allow doing a layer 2 traceroute using the Book Of Knowledge database), and a file to read from (later on manual traceroutes can be put into a file and used instead of having topobuilder do the traceroutes), but neither of these does anything yet. Topobuilder must be run on cpr-northcentral currently, as it uses threading which is not available on cpr-central.
cd ~cpr/topology/  ./ -n NETWORK
This script will chug along for a while and once it's done, the graph of the network will be stored in the database. Any new hosts discovered will be added to the Host table, and each link in the network will be added to the Link table. Every entry added to the Link table in this iteration of the script will have a revision number that is one greater than the previous time the script was run. This way we can keep track of changes in the network. Note that no images are generated when running this tool, Faultfinder must be used instead to do this. See below.

Performing fault localization on the network

For one semester, I did a project on fault localization. The result is a tool called faultfinder. Here's the short version with an explanation of how to run the thing. (If you're just trying to generate a static map of a simple topology, check out ~cpr/faultfinder/ It needs to be run on cpr-northcentral or the Perl dependencies need to be installed on cpr-central) Faultfinder requires a few parameters to run: Currently this tool is inoperable because it's being updated from working on test data to the real data, but once the changes mentioned above are implemented, it can be run by:
cd ~cpr/faultfinder/; ./ -n NETWORK -l LAYER -m METRIC -t TIMESTAMP -v THRESHOLD

Using the distributed data storage system

datawarehousing project, "Database Clustering writeup". cpr/distributeddb/ These are all the files for the distributed data storage tool. Nothing needs to be run by hand but here is what happens: on the campus cpr nodes copies stuff to the usual location and ~cpr-data/for_northcentral/campus/ -processing/ runs every minute and inserts stuff into the local database and the cluster -processing/ runs every now and then and updates the weights in servers.xml so that inserts can be randomized appropriately -statuspage/ is at -statuspage/cleanupDB.php is run every now and then and cleans out data older than a day from the database on northcentral More details are in README.txt tehre