Friday, April 28, 2006

Linux Success

Recently, the FAA and RedHat released some facts about their movement to RedHat's Enterprise Server systems within their administration. They are claiming a $15 million savings, and more than a 30% operational gain by using the linux operating system.

It's been our experience, that although this is a press release from RedHat, that it is more than typical of the productivity and cost savings of linux.

Web Page Design

Keep this in mind: Eye Tracking Web Usage.

Thursday, April 27, 2006

Outside Sales

We were recently contacted by a sales 'person' who sells telecom to clients on the east coast (in a specific region). We worked with this particular individual almost 4 years ago, once, and haven't heard from them, or seen them, since then. Needless to say, the phone call was a bit unexpected.

From what we could tell, after a few phone conversations, was that the company they are currently with, selling Telecom (cell phones, land lines), has decided to stop paying commissions on data services (T1 / T3 lines, remote leasing and hosting of equipment). So, this individual was losing a lot of cash on one of their clients, who is a $15MM company, spending almost $40K / month in data and phone services. We analyzed the client for them, and it looks like we could perform the same set of services for more like $10K - $15K / month (one of the benefits of not being on the East Coast).

It sounds like a win-win, right? Client drops down to almost 1/3 or 1/4 of their monthly costs, we pick up some income, and we pay the sales-person some percentage to take the client. Wrong. A few sticking points:
  • The sales person wanted their percentage (10%) forever. And when they died, they wanted their estate to get it.
  • Even though our firm can provide a more reasonable pricing structure, as the cost of living is lower than on the East Coast, the reality is that this means less money for the sales person.
  • The goal of the sales person became "Find the clients nose-bleed point, and bill them that", which means that the clients' best interest is never the most important.
The end result, after spending a week or two back and forth with the sales person, was that we weren't right for the job. Instead the sales person found a provider on the East Coast to provide services for the $15MM client. I had to ask, why were we not right for the job?

The answer was veiled, but the reality is this: The chosen service provider was going to charge more, and therefore, the sales person was going to make more. A lot more (double perhaps). Why? You have to remember, management of a system or network can be done from anywhere these days. The difference is the quality of the staff, and rate.

The provider and client are on the east coast. They charge east coast rates. We are in central Texas. We charge central Texas rates. The sales person wanted paid on a percentage of monthly sales. Therefore, what is good for the sales person (more money) is the opposite of what is good for their client (more money).

Makes you wonder what the definition, and ethics, of 'this is my client' are?

Graphics Finalized

We finalized the deployment of our PDF image code, using PDFBox, and it works great. The only problem being that we need to develop the ability to detect horizontal versus vertical positioning of the graphic. Our client now has the ability to yank graphics out of PDF's on the fly, and insert them as thumbnails into other PDF's.

Works great!

Wednesday, April 19, 2006

Lubbock Christian University

Today I met with a young lady I met several months ago, when she applied for a job with us as a programmer. At the time, I didn't think she was fully qualified for the position, and needed some more work on her 'base'.

This summer, she needs an internship for Lubbock Christian University, and has decided to move into programming. Turns out that the internship is required -- and of the five companies she tried to work at, we're the only ones that responded.

She came in today to talk about her summer project, and to see what she needed to brush up before her start date, May 15th.

Turns out, she also applied at Smooth Fusion. I had forgotten that they even existed.

Monday, April 17, 2006

Efficient Frontier and Optimizing Your Advertising

We signed up with Efficient Frontier today to help one of our clients manage their advertising. They do online, AI-based bidding on Keyword advertising on Google, Yahoo, and Miva. Here's some of the theory:
  1. You bid to show an ad on Google (as an example) for a certain keyword. Example: You want searches for the keyword 'bibliography' to trigger your ad: 'Automatically Generates Bibs, Free Unlimited Usage, Generates Bibs and Footnotes', which should point to your website, www.SpoonFedBib.com, and you want to pay no more than $3.00 per click. (This is a PPC model, Google also has CPM advertising).
  2. Google then puts your ad onto their site (it is, after all, how they make money), so that when someone else searches for the keyword 'bibliography', the above ad is shown.
  3. Google will use a variety of metrics to determine what PLACE to show the ad. Placement starts at the top of the page (positions 1, 2, and 3), and then work around to the right-hand side of the page (positions 4 through 8).
  4. Google's metrics include Click Thru Ratios (CTR's), relevancy, keyword frequency, and a host of other items. They haven't revealed their exact secrets, but we know from past experience that there are quite a few ingredients.
  5. Once a web user clicks on the ad, your account is charged some fee. The fee is based on those same metrics (ie, being in the #1 spot doesn't always mean that you are paying the most -- it may be that your ad is the most relevant). They call their mathematics a "company IP issue", and don't really reveal what clicks you've been billed. Once you've been billed, the user is forwarded to the URL you defined (above, www.SpoonFedBib.com).
  6. The user then ends up on your website, and you need to figure out how to collect their information, or sell them something, or whatever your goal was for spending money to get them to your site. If you're smart about it, your landing page (the first page the user sees) should be developed to maximize the users experience for the keyword they searched for (in this case, 'bibliography').
Now, the above process, steps 1 - 6, are all fairly easy to do. In fact, Google has a nice, simple, setup for do-it-yourself'ers. Just go over to AdWords and get going. Google will let you set a daily / monthly budget, and you can start running your ads. Google will meter them throughout the day, so you can set it, and forget it. What does this have to do with Efficient Frontier already?

However, to actually use this system to the maximum effectiveness, you will need a way to track what clicks convert into sales/leads/information requests, what keywords convert, what days / times convert, what the user was searching for when they clicked on your link, etc...

Once you've started collecting some of this information, it becomes obvious that certain clicks never (or rarely) convert. For instance, you may find out that users who click on your ad, when they search for 'bibliography' at 4am on Monday, never buy anything. But, Google will run the ad, and collect the clicks, even though it never translates into any dollars for your site. What about a more complex case? Clicks when your ad is in position #3 may actually convert more than clicks received when your ad is in position #1. Again, Google is more worried about their bottom line, not yours. They aren't going to drop your ad to position #3 to make sure that you optimize your advertising.

This is where Efficient Frontier comes into play. They have built an Artificial Intelligence based back-end which monitors dates / times / keywords / bids / Click Thru Ratios / Conversion Ratios / positions, and can then bid for you to ensure the most bang for the buck. Their system learns from itself, updating itself with the latest and greatest statistics, to ensure that you get the conversions / leads / branding that you are paying Google for.

There are a lot of other competitors on the market, but most of them use "rule" based bidding. Probably the most prominent is Atlas One Point, which is a self-serve system of rule based bidding (ie, you go online and setup a rule to bid no more than $X between 1pm and 5pm for keyword A - you are billed based on the number of rules that you have).

New Bank

The bank officer that we like to utilize when working with money moved over to a new bank (Peoples Bank). After meeting with him for a few hours at his new location, last week, we decided to switch banks. So far, it has gone off without a hitch, and the new bank has been extremely helpful.

As for the old bank, Plains Capital Bank, I haven't heard from the officer who took over our accounts, which must mean that it's not that important for us to need to meet them.

[Side note: We are responsible for about 0.05% of the PCB's business. We are responsible for almost 1.0% of the new banks business.]

PDF Graphic Conversion

As promised, here is some Java code which can pull a graphic out of a PDF, convert it to an array of bytes. You can then do whatever you like with it. Code is not final, but getting close.

// Returns the contents of the file in a byte array.
public static byte[] getBytesFromFile (File file) throws IOException {
InputStream is = new FileInputStream (file);

// Get the size of the file
long length = file.length ();

// Before converting to an int type, check
// to ensure that file is not larger than Integer.MAX_VALUE.
if (length > Integer.MAX_VALUE) {
// File is too large
return null;
}

// Create the byte array to hold the data
byte[] bytes = new byte[(int)length];

// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < bytes.length && (numRead=is.read (bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}

// Ensure all the bytes have been read in
if (offset < bytes.length) {
throw new IOException ("Could not completely read file "+file.getName ());
}

// Close the input stream and return bytes
is.close ();
return bytes;
}

/*
* This method opens a byteArrayInputStream (and assumes it's a PDF)
* Once open, the file is read in, and all of the images are located on the page (by the pageNumber).
* The image specified (by the imageNumber) is returned.
*
* In this case, pageNumber and imageNumber are both "1" based
*
*/
private byte[] fetchImageFromPDF (ByteArrayInputStream inp, int pageNumber, int imageNumber) throws Exception {
try {
if (inp == null)
return null;

PDDocument document = PDDocument.load (inp);

List pages = document.getDocumentCatalog ().getAllPages ();
PDPage page = (PDPage) pages.get (pageNumber-1);
PDResources resources = page.getResources ();
Map images = resources.getImages ();
ByteArrayOutputStream baOut = new ByteArrayOutputStream ();
if (images != null) {
Iterator imageIter = images.keySet ().iterator ();
int i = 1;
while (imageIter.hasNext ()) {
if (i == imageNumber) {
String key = (String) imageIter.next ();
PDXObjectImage image = (PDXObjectImage)images.get ( key );
image.write2OutputStream (baOut);
// String name = "/tmp/out12";
// System.out.println ( "Writing image:" + name + "." + image.getSuffix ());
// image.write2file ( name );
return baOut.toByteArray ();
}
i++;
}
}
} catch (Exception e) {
throw new Exception ("Exception fetching image from byteArray PDF: " + e.getMessage ());
}

return null;
}

/*
*
* This method is really a wrapper for the same method using a byteArray, using a File object.
* This method calls the getBytesFromFile(); method
*
*/
private byte[] fetchImageFromPDF (File f, int pageNumber, int imageNumber) throws Exception {
try {
if (f == null)
return null;

byte[] bytes = getBytesFromFile (f);
ByteArrayInputStream inp = new ByteArrayInputStream (bytes);

return this.fetchImageFromPDF (inp, pageNumber, imageNumber);
} catch (Exception e) {
throw new Exception ("Exception fetching image from file PDF: " + e.getMessage ());
}
}

Sunday, April 16, 2006

Whoa .... What Is This?

This 'consultant' absolutely cracks me up. Jeffrey's been hanging around on the Food Service Forums recently, and apparently likes to rip a lot off of Seth Godin and other Internet visionaries.

For some reason, I find the fact that his website is completely unusable (try and click on the scroll bar to scroll, left-click on any link, or right click anywhere) not very telling of a successful marketer. The blog is .... well, you can make your own opinion.

Thumbnails of PDF's to place into PDF's

On Thursday we were given a new project by a client which was a new task for us. They have scans of documents, which are individual images and have been converted into PDF's. They need a program which can take these images, convert them into smaller thumbnails, and then insert the thumbnails into a new document. All of this needs to be generated on the fly, of course.

We typically develop using open-source technologies, where available. In this instance, we've been utilizing iText for most PDF document processing. However, iText doesn't do extraction of documents very well. It looks like a commercial product, like JPedal, Apose, Argus, Big Faceless or PDF Tools, would run anywhere between $599 and $1299 for a single server instance. In this case, that's more than the client wants to pay.

As a last ditch effort, we found PDF Box which is an Open Source project, and has some good pre-packaged tools. Documentation is very, very sparse.

We started coding the solution on Friday, and should be done by Monday afternoon. If it's of any interest, I'll post up some of the solution.

Thursday, April 13, 2006

Broken Doors

We hired some 'local' help to work on our yard, which is need of some severe repair. Specifically, we have 'goat heads' and 'spreader stickers', which are about 1cm - 2cm long which will stick in your foot, your animals feet, clothing, bike tires, and they HURT.

I suppose, though it's un-verified at the moment, that a stray rock from the mowing or edging got kicked up and absolutely killed our front door, which is glass. The spot where it hit was towards the bottom left corner, and it spidered all the way up to the top. When I got home, several hours later, it was still spidering and cracking.

Looks like we'll be adding some new doors this weekend. Probably one to the front, and one to the back, with sliding screens, so we can see and hear the children playing.

However, the yard looks wonderful. If anyone needs some quality yard work done in the Lubbock area, let me know.

Sprinklers: We want to add an automatic sprinkling system in the next month or so. Does anyone have any recommendations for the Lubbock area?

Wednesday, April 12, 2006

SpoonFedBib.com

After more than 3 months of development, we are only a few weeks away from releasing our first public product, www.SpoonFedBib.com. We've integrated the Library Of Congress, built AJAX functionality in, and can handle bibliographies in MLA, APA, and Turabian formats. It will be a no cost website, with a few unique twists, and similar to something like EasyBib or NoodleTools. You'll have to wait a while longer to see what it's actually going to be, but it will be released soon.

In the meantime, we had to go find a new host. OLM decided to take their merry time to work on the problems they were having with Tomcat. The last issue was between mod_jk and the various Connectors. Apparently, the admin that installed it is no longer there. And no one else knows what the heck is going on. As of today, it's been almost 8 days since I notified them of the problem, and I've called back three times. Oh well.

Kattare is our new provider of choice, and they are, so far, awesome. Their interface is quite slick, and I love that they are using what appears to be a polished mix of open source and custom tools. We won't be deploying anything until tomorrow, at the earliest, but I'll update here as we deploy.

Friday, April 07, 2006

Degree's and MBA's

Since we now have a few employees who are college bound, we've been looking at ways to make the college transition work smoothly for us and our employees. Note to budding CS majors: My high-school employees can run circles around my degreed employees. Not sure why, but I think it's passion.

As a test, I started looking at places that I'd want to finish up my degree at. One possibility is to go back to Texas Tech for a few semesters. However, the last few classes that I need to take are all in the middle of the day, and missing one or two would mean failing the whole class. I own a company with clients and several products almost out the door, and missing 4 hours of the middle of the day is just not feasible.

Neumont University came about recently -- or, I started seeing ads for it recently (all over Slashdot). There's not much online about it, other than a few Microsoft employees who have given presentations there (it appears to be co-owned by Microsoft and IBM). Neumont might be a possibility for the CS employees.

University of Phoenix is, of course, advertising everywhere. However, I have more than 120 college credits, and they'll only transfer about 30 of them. Which means it will take me about $40,000 and 3 years to finish the degree. Why would I bother? And it's not cost-effective for recent employees.

I called and spoke with the Art Institute Online on behalf of our graphic artists. They have some interesting programs, but fall into the same "traps" as the others. Don't transfer hours, charge a lot per-hour, want up-front fees to verify MY information.

Westwood College is another one we checked out. Of all of the programs, I liked theirs the most -- perhaps it was because of the sales guy, or perhaps it was their alumni retraining program (if you graduate in CS from their school, you can return for re-training at no cost, for the rest of your life). Again, too much money, not enough transfer credits.

We've decided that the online schools still aren't quite there. The curriculum is good, but the cost factor is still astronomical for a e-learning place (in my opinion).

Interestingly enough, in looking at the various option, I ran across a post today from Seth Godin, referencing Josh Kaufman's Personal MBA. I own about 8 of those books (he did pick some good ones). I just might saddle up and see what I can do about getting the rest of them.

Tuesday, April 04, 2006

Outsourcing

We're slowing treading into some outsourcing waters. Heck, we're even trying offshoring.

One of our clients needed some 3d animations (DVD's) built in a rapid amount of time, and wants to re-sell them for a very low price. Lower than I could have them built in the United States. We've managed to locate two different providers (one in Romania, and one in India) who have developed some of the animations for us, at about half price of what we could do it for. Now we are managing the outsourcing / offshoring companies.

We're now trying some slightly larger projects, which we have some cash for, but no time or manpower to develop. So, our goals are two fold:
  1. Lower Cost: We could hire an intern or part-time developer to help us out, but right now we are working on as many projects as we can work on. So, we want to keep the costs lower than what it COULD cost us to hire someone.
  2. Speed: I could table the project, but that could also be a bad thing. Instead, we want to try and get the project out the door, not in a great rush, but not at a creeping pace either.
So, we posted a project on eLance the other day, and ended up with about 13 or 14 bids, from all over the United States, India, Eastern Europe, and Pakistan. The bottom-end of the quotes was around $1,800. The top end was around $63,000. The average bid was $7,800. Times ranged from 6 weeks to 28 weeks (7 months). Average was about 3 months. A graph of the time vs. cost would likely be very interesting.

We've narrowed down the quotes to three. The first is at $3,200 by a provider who has a 4.7 (or so) ranking, and has done more than $250,000 in business on eLance -- they can build it in 8 weeks. The second is at $5,000 by a provider who has a 5.0 (or so) ranking, and has done more than $500,000 in business on eLance -- they can build it in 10 weeks. The third is at $11,000, and has done more than $280,000 in business on eLance -- they can build it in 12 weeks.

Now, the question becomes, which one do I choose?

Sequoia and C-JDBC

Getting technical for a few minutes.

We are in the middle of a project by which we will be building a clustered web and database solution for a customer. This customer is looking for:
  • Failover - Ensuring that if one machine breaks, the next one (or two, or three....) keeps running without any hiccups.
  • Load Balancing - If there are two or three back-end servers in the cluster, then the requests to the database should be able to go to any of them.
  • Reliability - The system can keep running for a long time, and when there is a hiccup (they DO happen), the system can recover fast and keep running.
  • Ease Of Backups - Backing up a server can take some time, and as a system grows, so does the backup and restore time. Clusters make this even more difficult.
  • Database Independence - If the customer decides that their database is junk, or expensive (like Informix, in this case), they should be able to move to another database with minimal effort.
For about a year now, we've been using C-JDBC (now Sequoia) for some time, but hadn't had a chance yet to really kill it in testing. So, we decided we would before we put anything into production using it. Needless to say, it didn't handle the various scenarios that well.

We brought up a few instances of Sequoia, got it reading off of a few databases (in RAID-1 mode), and started hammering it with a few thousand inserts. That worked, but was slow, an average insertion taking about 40% longer than if it was into a single database (we assumed some slow-down on the insertions). Selects, updates, insertions, and deletions all worked as expected.

Things started getting out of hand when we started unplugging machines from the network, and taking backups. It seems that Sequoia uses JGroups in the background to handle the master/slave communication. That's fine. However, we found a lot of problems came up if the network between the master/slave was slow, or dropped packets. The end result was very undefined, and could end up with machines out of sync or un-responsive.

Furthermore, if a machine goes down, and you want to bring it back online, you're in for a headache. The JGroups layer appear to bring the machine back into play just fine, but the machine will need sync'd back up. To do this, you have to use the C-JDBC / Sequoia backuper tool. This backuper uses either Octopus (which we couldn't ever get to work, with or without Sequoia), or the native backup tool (in this case, pg_dump).

So, a Sequoia restore consists of restoring a complete dump, and then restoring the meta-data log that Sequoia keeps. The meta-data log will ensure that the database gets back in sync, without taking down the running instances. This can take hours, or longer. Another issue is specific to pg_dump (something I have an issue with anyway), which cannot take an incremental backup. Instead, it dumps the whole database into a single file (yes, you can split it up, manually) -- and transferring that file back and forth between the nodes (to do a restore) can take some time.

We encountered a lot of instabilities when we took down nodes while it was running. Sometimes we would have to do a full restore. Sometimes we would have to restart the servers entirely, and reload the entire cluster. Overall, the package is fine ... but not yet production ready. A bigger issue becomes support. Community support is OK for a project -- but at the cluster / enterprise level, commercial support is probably a good thing.

We are now looking at native PostgreSQL implementations. However, with the idea of support, we're looking at EnterpriseDB. We already have it up and running, with Slony I, and using pgpool on the front-end as a load-balancer. We're quite happy with it at the moment. It does have its own quirks, however.

More posts later on the PostgreSQL solution.