Linna Wang's Blog

Monday, August 12, 2013

Why Vagrant
Vagrant provides easy to configure, reproducible, and portable work environments built on top of industry-standard technology and controlled by a single consistent workflow to help maximize the productivity and flexibility of you and your team.

Vagrant in windows
1. Install VM VirtureBox and open the VM Manager
--VM download link: https://www.virtualbox.org/wiki/Downloads
2. Download vagrant for windows
--Vagrant download link: http://downloads.vagrantup.com/
3. Open GitHub, clone the vagrant image to your local repositories
4. Open a shell from the directory /vagrant, and run command: $vagrant init
5. Run command: $vagrant up
--This will automatically generate VM box to your local machine with Vagrant image

PERL MODULE

TAP::Harness
Description: This is a simple test harness which allows tests to be run and results automatically aggregated and output to STDOUT.
Synopsis:

use TAP::Harness;
my $harness = TAP::Harness->new( \%args );
$harness->runtests(@tests);

Methods:

 my %args = (
    verbosity => 1,
    lib     => [ 'lib', 'blib/lib', 'blib/arch' ],
 )
 my $harness = TAP::Harness->new( \%args );

Verbosity Level:

 1   verbose        Print individual test results to STDOUT.
 0   normal
 -1   quiet          Suppress some test output (mostly failures 
                        while tests are running).
 -2   really quiet   Suppress everything but the tests summary.
 -3   silent         Suppress everything.

DBI
The DBI is the standard database interface module for Perl. It defines a set of methods, variables and conventions that provide a consistent database interface independent of the actual database being used.

WWW::Scripter - For scripting web sites that have scripts
Description: This is a subclass of WWW::Mechanize that uses the W3C DOM and provides support for scripting.
Synopsis:

use WWW::Scripter;
  $w = new WWW::Scripter;

  $w->use_plugin('Ajax');  # packaged separately
  
  $w->get('http://some.site.com/that/relies/on/ajax');
  $w->eval(' alert("Hello from JavaScript") ');
  $w->document->getElementsByTagName('div')->[0]->....

  $w->content; # returns the HTML content, possibly modified
               # by scripts

HTML::TreeBuilder::XPath - add XPath support to HTML::TreeBuilder
Description: This module adds typical XPath methods to HTML::TreeBuilder, to make it easy to query a document.
Synopsis:

use HTML::TreeBuilder::XPath;
  my $tree= HTML::TreeBuilder::XPath->new;
  $tree->parse_file( "mypage.html");
  my $nb=$tree->findvalue( '/html/body//p[@class="section_title"]/span[@class="nb"]');
  my $id=$tree->findvalue( '/html/body//p[@class="section_title"]/@id');

  my $p= $html->findnodes( '//p[@id="toto"]')->[0];
  my $link_texts= $p->findvalue( './a'); # the texts of all a elements in $p
  $tree->delete; # to avoid memory leaks, if you parse many HTML documents

Date::Calc - Gregorian calendar date calculations

Test::Simple - Basic utilities for writing tests.
Description: This is an extremely simple, extremely basic module for writing tests suitable for CPAN modules and other pursuits. If you wish to do more complicated testing, use the Test::More module (a drop-in replacement for this one).
Synopsis:

 use Test::Simple tests => 1;

  ok( $foo eq $bar, 'foo is bar' );

The basic unit of Perl testing is the ok. For each thing you want to test your program will print out an "ok" or "not ok" to indicate pass or fail. You do this with the ok() function (see below).

The only other constraint is you must pre-declare how many tests you plan to run. This is in case something goes horribly wrong during the test and your test program aborts, or skips a test or whatever. You do this like so:

    use Test::Simple tests => 23;

You must have a plan.

  ok( $foo eq $bar, $name );
  ok( $foo eq $bar );

ok() is given an expression (in this case $foo eq $bar). If it's true, the test passed. If it's false, it didn't. That's about it.
ok() prints out either "ok" or "not ok" along with a test number (it keeps track of that for you).

Test::More - yet another framework for writing test scripts
Description: The purpose of this module is to provide a wide range of testing utilities. Various ways to say "ok" with better diagnostics, facilities to skip tests, test future features and compare complicated data structures. While you can do almost anything with a simple ok() function, it doesn't provide good diagnostic output.

Before anything else, you need a testing plan. This basically declares how many tests your script is going to run to protect against premature failure.

The preferred way to do this is to declare a plan when you use Test::More.

  use Test::More tests => 23;

There are cases when you will not know beforehand how many tests your script is going to run. In this case, you can declare your tests at the end.

  use Test::More;

  ... run your tests ...

  done_testing( $number_of_tests_run );

Sometimes you really don't know how many tests were run, or it's too difficult to calculate. In which case you can leave off $number_of_tests_run.

In some cases, you'll want to completely skip an entire testing script.

  use Test::More skip_all => $skip_reason;

Your script will declare a skip with the reason why you skipped and exit immediately with a zero (success).

If you want to control what functions Test::More will export, you have to use the 'import' option. For example, to import everything but 'fail', you'd do:

  use Test::More tests => 23, import => ['!fail'];

Alternatively, you can use the plan() function. Useful for when you have to calculate the number of tests.

  use Test::More;
  plan tests => keys %Stuff * 3;

or for deciding between running the tests at all:

  use Test::More;
  if( $^O eq 'MacOS' ) {
      plan skip_all => 'Test irrelevant on MacOS';
  }
  else {
      plan tests => 42;
  }

done_testing

    done_testing();
    done_testing($number_of_tests);

If you don't know how many tests you're going to run, you can issue the plan when you're done running tests.
$number_of_tests is the same as plan(), it's the number of tests you expected to run. You can omit this, in which case the number of tests you ran doesn't matter, just the fact that your tests ran to conclusion.
This is safer than and replaces the "no_plan" plan.

Monday, April 11, 2011

Linux / Unix Command: crond

cron - daemon to execute scheduled commands

Description:

Cron should be started from /etc/rc or /etc/rc.local.

Cron searches /var/spool/cron for crontab files which are named after accounts in /etc/passwd; crontabs found are loaded into memory. Cron also searches for /etc/crontab and the files in the /etc/cron.d/ directory, which are in a different format. Cron then wakes up every minute, examining all stored crontabs, checking each command to see if it should be run in the current minute. When executing commands, any output is mailed to the owner of the crontab (or to the user named in the MAILTO environment variable in the crontab, if such exists).

Additionally, cron checks each minute to see if its spool directory's modtime (or the modtime on /etc/crontab) has changed, and if it has, cron will then examine the modtime on all crontabs and reload those which have changed. Thus cron need not be restarted whenever a crontab file is modified.

Cron (a Linux process that performs background work, often at night) is set up by default on your RedHat system. So you don't have to do anything about it unless you would like to add some tasks to be performed on your system on a regular basis or change the time at which cron performs its duties.

Please note that some of the cron work might be essential for your system functioning properly over a long period of time. Among other things cron may:

- rebuild the database of files which is used when you search for files with the locate command,

- clean the /tmp directory,

- rebuild the manual pages,

- "rotate" the log files, i.e. discard the oldest log files, rename the intermediate logs, and create new logs,

- perform some other checkups, e.g. adding fonts that you recently copied to your system.

Therefore, it may not be the best idea to always switch your Linux machine off for the night--in such a case cron will never have a chance to do its job. If you do like switching off your computer for the night, you may want to adjust cron so it performs its duties at some other time.

To find out when cron wakes up to perform its duties, have a look at the file /etc/crontab, for example:

cat /etc/crontab

It may contain something like this:

# run-parts

01 * * * * root run-parts /etc/cron.hourly

02 4 * * * root run-parts /etc/cron.daily

22 4 * * 0 root run-parts /etc/cron.weekly

42 4 1 * * root run-parts /etc/cron.monthly

You can see that there are four categories of cron jobs: performed hourly, daily, weekly and monthly. You can modify those or add your own category. Here is how it works.

The columns in the entries show: minute (0-59), hour (0-23), day of month (1-31), month of year (1-12), day of week (0-6--Sunday to Saturday). The "*" means "any valid value".

Thus, in the example quoted, the hourly jobs are performed every time the computer clock shows "and one minute", which happens every hour, at one minute past the hour. The daily jobs are performed every time the clock shows 2 minutes past 4 o'clock, which happens once a day. The weekly jobs are performed at 22 minutes past four o'clock in the morning on Sundays. The monthly jobs are performed 42 minutes past four o'clock on the first day of every month. The directory with the script file that contain the command(s) to be executed is shown as the last entry on each line.

If you wanted your jobs to be performed at noon instead of 4 in the morning, just change the 4s to 12s. Cron wakes up every minute and examines if the /etc/crontab has changed so there is no need to re-start anything after you make your changes.

If you wanted to add a job to your cron, place a script which runs your job (or a link to your script) in the directory /etc/cron.hourly or cron.daily or /etc/cron.weekly, or /etc/cron.monthly .

Here is an example of an entry in /etc/crontab which causes a job to be performed three times a week (Mon, Wed, Fri):

02 4 * * 1,3,5 root run-parts/etc/cron.weekly

Friday, April 8, 2011

Ubuntu Linux OpenSSH Server installation and configuration

OpenSSH is a FREE version of the SSH connectivity tools that technical users of the Internet rely on. Users of telnet, rlogin, and ftp may not realize that their password is transmitted across the Internet unencrypted, but it is. OpenSSH encrypts all traffic (including passwords) to effectively eliminate eavesdropping, connection hijacking, and other attacks. Additionally, OpenSSH provides secure tunneling capabilities and several authentication methods, and supports all SSH protocol versions.

Install:

# sudo apt-get install openssh-server openssh-client

Tuesday, April 5, 2011

Diagnostic Report

ps -ef

List of all running processes

top

List the top resource consumers

env

List all environment variables

find -ls

Recursively list all files and directories

df -hk

Display the file system information

uname -Xa

List information about the server

psrinfo -v

List information about the servers processors

jstack

Get a stack trace of Java program to see what logic is running or stuck in.
Use kill -3 if jstack not working due to high cpu usage

http://download.oracle.com/javase/6/docs/technotes/tools/share/jstack.html

jinfo

Get all Java runtime parameters such as command line options, classpath, etc.

http://download.oracle.com/javase/6/docs/technotes/tools/share/jinfo.html

jmap

Dump Java program memory into file for memory leak analysis.

http://download.oracle.com/javase/6/docs/technotes/tools/share/jmap.html

Tuesday, March 29, 2011

Data integration

Data integration: retrieve data from different sources and assemble it in a unified way.

Data integration focuses mainly on databases. A database is an organized collection of data. It's similar to a file system, which is an organizational structure for files so they're easy to find, access and manipulate.

There's the common data storage method, also known as data warehousing. Using this method, all the data from the various databases you intend to integrate are extracted, transformed and loaded. That means that the data warehouse first pulls all the data from the various data sources. Then, the data warehouse converts all the data into a common format so that one set of data is compatible with another. Then it loads this new data into its own database. When you submit your query, the data warehouse locates the data, retrieves it and presents it to you in an integrated view.

A data warehouse is a database that stores information from other databases using a common format.

Descriptions of data are called metadata. Metadata is useful for naming and defining data as well as describing the relationship of one set of data to other sets. Data integration systems use metadata to locate the information relevant to queries.

The warehouse must have a database large enough to store data gathered from multiple sources. Some data warehouses include an additional step called a data mart. The data warehouse takes over the duties of aggregating data, while the data mart responds to user queries by retrieving and combining the appropriate data from the warehouse.

One problem with data warehouses is that the information in them isn't always current. That's because of the way data warehouses work -- they pull information from other databases periodically. If the data in those databases changes between extractions, queries to the data warehouse won't result in the most current and accurate views. If the data in a system rarely changes, this isn't a big deal. For other applications, though, it's problematic.

Thursday, March 24, 2011

Load balancing

In networking, load balancing is a technique to distribute workload evenly across two or more computers, network links, CPUs, hard drives, or other resources, in order to get optimal resource utilization, maximize throughput, minimize response time, and avoid overload. Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy. The load balancing service is usually provided by a dedicated program or hardware device (such as a multilayer switch or a DNS server).

F5 Networks originally manufactured and sold some of the very first load balancing product, called BIG-IP. If a server went down or became overloaded, BIG-IP directed traffic away from that server to other servers that could handle the load.

F5's BIG-IP product is based on a network appliance (either virtual or physical), which runs F5's Traffic Management Operating System (TMOS), which runs on top of Linux. This appliance can then run one or more product modules (depending on the appliance selected), which provide the BIG-IP functionality.