Sunday, 24 February 2008

Creating a web application from scratch, backed by Fedora-Commons and Apache Solr (Part 1)

(Part 1 will detail the installation and setup of the basic system, services and libraries needed for a Fedora-Commons/Apache Solr backed web 'service'. Subsequent parts will deal with configuring and feeding the search engine, and constructing a web interface to handle article/blog/comment posting and using OpenID for authentication.)

Step 1 - Get a nice clean linux distribution focused on use on servers.

I am using Ubuntu JeOS, as it will be hosted on a VMware virtual machine. This also means that the walkthrough that follows will be very debian specific.

See the following pages for help:

This page aims at documenting how to create virtual appliance using Ubuntu Server Edition's JeOS.

This page has snapshots of the entire install process and also information on how to set up a LAMP stack but if you want to follow this guide, don't install any of the applications it asks. It is only useful for our purposes up until the first reboot, before the guide does things we don't need, like activating the root user account, or adding PHP.

Note that the user name I will be using in this guide is simply 'user' and I will refer to this as either 'user' or 'username'. Replace this with whatever the username was that you chose during installation.

Step 2 - install it and set up networking and firewalls

Now, firewalls aren't as critical as you might think, especially if you have installed something like JeOS which has nothing really running as default. But there are some very handy tricks to help stop abuse from malicious script kiddies.

(For example, my favorite two liner I always add to the iptables firewall script is a couple of lines that rate limits ssh access tries to 3 attempts every 180 seconds. (Note, that the following doesn't immediately ACCEPT it, it passes it into a iptables chain called 'TRUSTED' which deals with what may be genuine attempts at access. If you wanted to just accept it, change TRUSTED to ACCEPT.)

# Rate limit SSH attempts.
iptables -A INPUT -p tcp -m tcp --dport ssh -m state --state NEW \
-m recent --hitcount 3 --seconds 180 --update -j DROP

# Allow first attempts through
iptables -A INPUT -p tcp -m tcp --dport ssh -m state --state NEW \
-m recent --set -j TRUSTED

NB Something along these lines would be fine to rate-limit upload attempts to Fedora as well.

[Edit: seems there was something awry with the following script - as illustrated here. Thanks to the Rubric team. I've added their fix :) But their fix may not be enough, so I'd advise not applying this until everything is installed and working correctly! I'll test it out as soon as I can. ]
Full example firewall script, including this snippet and port opening for tomcat, http/https, and ssh. (As this is from my home server, it'll include a few other services that you may not need for this walkthrough).

If you wish to use the SSL connection to Fedora on port 8443, remember to open that port as well!

Step 3 - Install all updates and get the basic applications we will need

Get a root prompt on a commandline:


[user@server]$ sudo -s

Then make sure you can a) connect to the internet and that b) the server is up to date:

[root@server]# apt-get update
[..... lots of lines of stuff ....]
Hit gutsy-updates/multiverse Sources [1708B]
Fetched 278kB in 0s (319kB/s)
Reading package lists... Done

[root@server]# apt-get upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be upgraded:
[ whatever packages that need to be upgraded will be listed here]
X upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 2497kB of archives.
After unpacking 16.4kB of additional disk space will be used.
Do you want to continue [Y/n]? Y
[ .... lots of lines of packages installing hopefully without error ..... ]

Hopefully, once those are installed, your machine will be up to date. Now to install on all the necessary packages:

[root@server]# apt-get install build-essential python-dev mysql-server sun-java5-jdk openssh-server python-mysqldb python-pysqlite2

Just let those install, but be aware that certain packages should ask you for information during installation, such as the required default root password for MySQL and a prompt will ask you if you agree with the Sun licence for Java.

Also, note that at this point, you should be able to SSH into the machine to continue working on it. It makes it a lot easier to cut and paste from guides if you do!

Step 4 - install some python libraries using Easy Install

Go here: and download and install the script as it shows. Simply running it as root should do the trick:

[root@server]# wget
[root@server]# python
[Edit: I did originally write the first part of this for Fedora 2.2, but since the REST api for Fedora 3.0 looks pretty damn usable, I've re-written this guide for version 3. Removing the installation/configuration for the SOAP client, drastically reduces the library dependancies this needs.]

So, we need to install some python libraries for later, iCalendar format (vobject) , OpenID consumer library (python-openid), and also install other miscellaneous things, such as a library that can generate UUIDs and a very good web framework called Pylons:

[root@server]# easy_install python-openid
[root@server]# easy_install uuid
[root@server]# easy_install vobject
[root@server]# easy_install pylons
(NB we have already installed the python libraries to interact with MySQL and SQLite with the apt-get install command earlier. It is best to install the latest stable packages for the items above, which is why they are installed through easy_install.)

Step 5 - Get Fedora-Commons and Apache Solr.

Either just blindly download the packages I tell you to:

[user@server]$ wget
[user@server]$ wget

Or better, download them from the homepages of the projects themselves, using links2

Install a text-based web-browser and browse and download the packages that way (A manual page for links2):

[root@server]# apt-get install links2

When that has finished installing, you can drop out of your root session (press Ctrl+D, or type 'exit') and download the relevant applications:

[root@server]# exit
[user@server]$ links2

Don't be alarmed, it's meant to blank the screen! Press the letter 'g' and an location bar prompt will appear.

First let's go to the Fedora commons site so type in '' and press enter. Use the cursor keys to go down and click (press return) on the 'Download Fedora 3.0 beta 1' link (24/02/2008). Scroll down a bit, and you should see a link to download the installer. You will be presented with the 'save jar file' dialog, so save the fedora installer jar file.

Now, let's get the search appliance, Solr. Got to '' and click on the 'download' link. Choose a mirror, go into the 1.2/ folder on that mirror and download the 'apache-solr-1.2.0.tgz' file. Press 'q' to quit links2.

Step 6 - Make the server environment ready for Fedora Commons

If you now list the home directory, you should see something like this:

[user@server]:~$ ls
apache-solr-1.2.0.tgz doc fedora-3.0b1-installer.jar

We will need the following:
  1. A directory to store Fedora's root directory (config files, logs, libraries, and default Tomcat instance)
  2. A mysql database and account for Fedora to use
  3. (Optional) A large filesystem to hold Fedora's data storage directory
Point 1 then - I chose to store the Fedora root directory at /opt/fedora30b1 -

[user@server]$ sudo -s
[root@server]# mkdir /opt/fedora30b1

Let the user own it: (Remember change 'user' to whatever your user is actually called!)

[root@server]# chown user:user /opt/fedora30b1

(Optional) And to aid upgrading, create a symlink at /opt/fedora to this folder:

[root@server]# ln -s /opt/fedora30b1 /opt/fedora

Fedora needs certain environment variables to be set up now, FEDORA_HOME and JAVA_HOME at the very least. Open up the system wide profile (/etc/profile) and add them in there. (I'm using the nano editor, vim is also available from a default JeOS install.)

[root@server]# nano -w /etc/profile

And add the following lines to the end of the file (also, note that there *must not* be any gaps either side of the '=' character, as tempting as it might be to press space to space it out to look better.):

# If you did not create the symlink, just point directly at your Fedora root
# or if you did do the 'ln -s ...' step, use this instead:


# If you did not create the symlink, just point directly at your tomcat root
# or if you did do the 'ln -s ...' step, use this instead:


export JAVA_HOME

Save the file (Ctrl-X in nano)

Now, to check that this has worked, type the command 'exit' a few times to logout and then log back in again as your default user. If things have worked well, the following commands should work:

[user@server]$ echo $FEDORA_HOME

[Or '/opt/fedora' depending on what you chose.]

[user@server]$ echo $JAVA_HOME

Now to sort out MySQL. Remember that default root password you set for MySQL? You'll need it now.

[user@server]$ mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.0.45-Debian_1ubuntu3.1-log Debian etch distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.


Now issue the following commands:

mysql> create database fedora30;
Query OK, 1 row affected (0.00 sec)

mysql> grant all on fedora30.* to 'fedoraAdmin'@'localhost' identified by 'PUTYOURPASSWORDHERE';
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

mysql> ALTER DATABASE fedora30 DEFAULT COLLATE utf8_bin;
Query OK, 1 row affected (0.00 sec)

mysql> exit


(NB You may or may not need to add the utf-8 configuration lines for your particular version of MySQL, but as far as I know, the commands are harmless if you don't need them and utterly crucial if you do. Well, crucial unless you are dealing purely with ascii, but could you really guarantee that?)

Step 7 - install Fedora commons 3.0b1

(Note - Official installation guide is here)

Go to the location where you saved the fedora installer, probably the user's home directory and run the installer. I'll include the entire installation dialog here. Where the response is blank, I simply pressed enter to accept the default.

[user@server]$ cd /home/user
[user@server]$ java -jar fedora-3.0b1-installer.jar

Fedora Installation

To install Fedora, please answer the following questions.
Enter CANCEL at any time to abort the installation.
Detailed installation instructions are available at:

Installation type
The 'quick' install is designed to get you up and running with Fedora
as quickly and easily as possible. It will install Tomcat and an
embedded version of the McKoi database. SSL support and XACML policy
enforcement will be disabled.
For more options, including the choice of hostname, ports, security,
and databases, select 'custom'.
To install only the Fedora client software, enter 'client'.

Options : quick, custom, client

Enter a value ==> custom

Fedora home directory
This is the base directory for Fedora scripts, configuration files, etc.
Enter the full path where you want to install these files.

Enter a value [default is /opt/fedora] ==>

Fedora administrator password
Enter the password to use for the Fedora administrator (fedoraAdmin) account.


Fedora server host
The host Fedora will be running on.
If a hostname (e.g. is supplied, a lookup will be
performed and the IP address of the host (not the host name) will be used
in the default Fedora XACML policies.

Enter a value [default is localhost] ==>

Authentication requirement for API-A
Fedora's management (API-M) interface always requires user authentication.
Require user authentication for Fedora's access (API-A) interface?

Options : true, false

Enter a value [default is false] ==>

SSL availability
Should Fedora be available via SSL? Note: this does not preclude
regular HTTP access; it just indicates that it should be possible for
Fedora to be accessed over SSL.

Options : true, false

Enter a value [default is true] ==>

SSL required for API-A
Should API-A be accessible exclusively via SSL? If true, requests
to access API-A URLs will be automatically redirected to the secure port.

Options : true, false

Enter a value [default is false] ==>

SSL required for API-M
Should API-M be accessible exclusively via SSL? If true, requests
to access API-M URLs will be automatically redirected to the secure port.

Options : true, false

Enter a value [default is true] ==> false

Servlet engine
Which servlet engine will Fedora be running in?
Enter 'included' to use the bundled Tomcat 5.5.23 server.
To use your own, existing installation of Tomcat, enter 'existingTomcat'.
Enter 'other' to use a different servlet container.

Options : included, existingTomcat, other

Enter a value [default is included] ==> included

Tomcat home directory
Please provide the full path to your existing Tomcat installation, or
the path where you plan to install the bundled Tomcat.

Enter a value [default is /opt/fedora/tomcat] ==>

Tomcat HTTP port
Which HTTP port (non-SSL) should Tomcat listen on? This can be changed
later in Tomcat's server.xml file.

Enter a value [default is 8080] ==>

Tomcat shutdown port
Which port should Tomcat use for shutting down? Make sure this doesn't
conflict with an existing service. This can be changed later in Tomcat's
server.xml file.

Enter a value [default is 8005] ==>

Tomcat Secure HTTP port
Which port (SSL) should Tomcat listen on? This can be changed
later in Tomcat's server.xml file.

Enter a value [default is 8443] ==>

Keystore file
For SSL support, Tomcat requires a keystore file.
If the keystore file is located in the default location expected by
Tomcat (a file named .keystore in the user home directory under which
Tomcat is running), enter 'default'.
Otherwise, please enter the full path to your keystore file, or, enter
'included' to use the the sample, self-signed certificate) provided by
the installer.
For more information about the keystore file, please consult:

Enter a value ==> included

Policy enforcement enabled
Should XACML policy enforcement be enabled? Note: This will put a set of
default security policies in play for your Fedora server.

Options : true, false

Enter a value [default is true] ==> false

Enable Resource Index
Enable the Resource Index?

Options : true, false

Enter a value [default is false] ==> true

Enable the REST-API? The REST-API is an EXPERIMENTAL feature that exposes
the Fedora API with a REST-style interface. In particular, URL endpoints
should not be considered final, nor has policy enforcement been evaluated.
For more information about the REST-API, see

Options : true, false

Enter a value [default is false] ==> true

Please select the database you will be using with
Fedora. The supported databases are McKoi, MySQL, Oracle and Postgres.
If you do not have a database ready for use by Fedora or would prefer to
use the embedded version of McKoi bundled with Fedora, enter 'included'.

Options : mckoi, mysql, oracle, postgresql, included

Enter a value ==> mysql

MySQL JDBC driver
You may either use the included JDBC driver or your own copy.
Enter 'included' to use the included JDBC driver, or, enter the location
(full path) of the driver.

Enter a value [default is included] ==>

Database username
Enter the database username Fedora will use to connect to the Fedora database.

Enter a value ==> fedoraAdmin

Database password
Enter the database password Fedora will use to connect to the Fedora database.


Please enter the JDBC URL.

Enter a value [default is jdbc:mysql://localhost/fedora30?useUnicode=true&characterEncoding=UTF-8&autoReconnect=true] ==>

JDBC DriverClass
Please enter the JDBC driver class.

Enter a value [default is com.mysql.jdbc.Driver] ==>

Successfully connected to MySQL
Deploy local services and demos
Several sample back-end services are included with this distribution.
These are required if you want to use the demonstration objects.
If you'd like these to be automatically deployed, enter 'true'.
Otherwise, the installer will put the files in your FEDORA_HOME/install
directory in case you want to deploy them later.

Options : true, false

Enter a value [default is true] ==>

Preparing FEDORA_HOME...
Configuring fedora.fcfg
Installing beSecurity
Installing Tomcat...
Preparing fedora.war...
Processing web.xml
Deploying fedora.war...
Deploying fop.war...
Deploying imagemanip.war...
Deploying saxon.war...
Deploying fedora-demo.war...
Installation complete.

Before starting Fedora, please ensure that any required environment
variables are correctly defined
For more information, please consult the Installation & Configuration
Guide, located online at or locally at

And that should merrily go away and install and setup Fedora and the bundled Tomcat server for you. Unlike other services you may install, this won't start the Fedora service, nor will it create a handy startup/shutdown script that integrates with you linux startup scripts in /etc/init.d. We will create one later on.

Step 8 - Further configuration of Fedora 3.0

!IMPORTANT! Fix the broken 'mail.jar' library! (Broken, as in the REST api will not work correctly with the version release in 3.0b1)

Get it from here: and use it to replace the mail.jar found in $FEDORA_HOME/tomcat/webapps/fedora/WEB-INF/libs/mail.jar. Restart Tomcat if you need to.

I am keen on UUIDs, and I cannot see a good reason for not using them. I suggest using the fedora id 'namespace' of uuid, so that a fedora URI will look like <info:fedora/uuid:d3733f61-1083-4a3e-b914-5a853c42189b>

It is also trivial to generate these in python, consider the following code:

[user@server]$ python
Python 2.5.1 (r251:54863, Oct 5 2007, 13:36:32)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from uuid import uuid4
>>> uuid4().urn[4:]

To get Fedora to accept these though, the 'uuid' namespace needs to be added to the retainPID region in fedora's configuration file.

[user@server]$ nano -w /opt/fedora/server/config/fedora.fcfg

Press Ctrl-W and search for retainPID. Add in uuid to the list of namespaces (the ordering is not important):

<param name="retainPIDs" value="demo uuid test changeme ...

Step 9 -
Installing Solr

[NB you will only have to follow the guide below, but here are the official docs, should you get in trouble - Basic installation - Tomcat specific things to bear in mind]

Extract the whole archive somewhere on disc and you will see something like this in the apache-solr-1.2 folder:

~/apache-solr-1.2.0$ ls
build.xml CHANGES.txt dist docs example KEYS.txt lib LICENSE.txt NOTICE.txt README.txt src

~/apache-solr-1.2.0$ ls dist
apache-solr-1.2.0.jar apache-solr-1.2.0.war

The easiest thing is to install Solr straight into the instance of Tomcat that Fedora has installed. One thing to be aware of is that search applications eat RAM and Heap for breakfast, so make sure you install it onto a server with plenty of RAM and it would be wise to increase the amount of Heap space available to the Tomcat instance. This can be done by making sure that the environment variable CATALINA_OPTS is set to "-Xmx512m". This can be done inside the script in your /opt/fedora/tomcat/bin directory.

[i.e. just add CATALINA_OPTS="-Xmx512m" at the beginning of the file if it doesn't already exist.]

One final bit of advice before I point you at the rather good installation docs is that you might want to rename the .war file to match with the URL pathname you desire, as the guide relies on Tomcat automatically unpacking the archive:

So, a war called "apache-solr-1.2.0.war" will result in the final app being accessible at http://tomcat-hostname:8080/apache-solr-1.2.0/. We will rename ours when we copy it into Tomcat's webapps directory.

Finally, Solr needs a place to keep its configuration files and its indexes. The indexes themselves have the capability to get huge (1Gb is not unheard of) and need somewhere to be stored. The documentation linked to below will refer to this location as 'your solr home' so it would be wise to make sure that this location has the space to expand. (NB this is not the directory inside Tomcat where the application was unbundled.)

So, let's create a solr home in /opt as we did for fedora (NB change user):

[user@server]$ sudo -s
[root@server]# mkdir /opt/solr
[root@server]# chown user:user /opt/solr

Place the solr.war into Fedora's Tomcat instance:

[root@server]# exit
[user@server]$ pwd
[user@server]$ cp dist/apache-solr-1.2.0.war $CATALINA_HOME/webapps/solr.war

Finally, we have to make sure a variable is available in Tomcat's environment; the location of the Solr home directory. Remember that CATALINA_OPTS line we added before? Amend that now to look like:

(E.g. via nano -w $CATALINA_HOME/bin/ )

CATALINA_OPTS="-Xmx512m -Dsolr.solr.home=/opt/solr"

Now, as we will shape the Solr search service later on (i.e. choosing the fields to be indexed, and how to index them for faceted searching) we will just copy across the basic solr example, to make sure everything is running fine.

[Make sure you are in the unpacked solr directory:]
[user@server]$ pwd
[user@server]$ cp -a example/solr/* /opt/solr
[user@server]$ ls /opt/solr
bin conf README.txt

Adding HTTP authentication to Solr update

First add a username/password to tomcat/conf/tomcat-users.xml:

<user username="solradmin" password="XXXXXXXX" roles="solradmin">
Then, in your Solr context, in tomcat/webapps/solr/WEB-INF/web.xml, add the following:


.... usual stuff ....

<!-- Define the Login Configuration for this Application -->
<realm-name>Auth needed


NB BASIC authentication sends the password over by plain-text, so this isn't too great but is suitable for a localhost updater. Change this to DIGEST to increase the security, but bear in mind you may need to set the Realm for the Tomcat container and Digest hash mechanism (SHA1, MD5, etc)

(Some good guides to securing Tomcat services are but a Google search away - for example: )

Step 10 - Test your foundation

Now, we need to start up Fedora, and hopefully, it will all go smoothly:

[user@server]$ cd /opt/fedora/tomcat/bin/
[user@server]$ ./
Using CATALINA_BASE: /opt/fedora/tomcat
Using CATALINA_HOME: /opt/fedora/tomcat
Using CATALINA_TMPDIR: /opt/fedora/tomcat/temp
Using JRE_HOME: /usr/lib/jvm/java-1.5.0-sun

Now try these links:

http://localhost:8080/fedora/describe - make sure 'uuid' is one of the retainPIDs
http://localhost:8080/solr/admin Should look like a whole heap of options and bells and whistles.

Any 404 or 500 Server errors means that something has come unstuck. But, if you've followed this guide, using an Ubuntu Gutsy you should be all set without a problem - I just followed it on my home computer without a hitch :)

Next up, we are going to build a pylons interface to do basic CRUD type functionality with the ability to link together items semantically, using the SIOC project's ( namespace, at

Fedora, RDF, Pylons and OpenID - a VRE?

I was toying with the idea of simple, atomic objects with a limited payload - a block of text, a file, or a reference (typically a URL) - and letting the user bind these together using RDF descriptions (the user will have all the technical details hidden from them of course. It'll be point and click to them.)

And I wanted to test this out by letting anyone come along and make use of it. I looked at Shibboleth for the auth layer, but there are some serious roadblocks to casual hacky use of it, which is part of its design (for better not worse, I might add.)

So, I've turned to OpenID for this demo thingy - Anything that takes away the authent/authorise part of an app is a Good Thing(tm) from my point of view.

Right, how to bind it all together? After asking on irc:// 'kwijibo' pointed me towards the SIOC namespace - which has handy classes like #Item and #Space, and very handy properties, such as #about, #attachment, #content, #note, #related_to, and #reply_of.

And, due to planets aligning, synchronicity, and all that type of stuff, I am going to implement it as the final part of my tutorial on building a website backed by Fedora-Commons.

Wednesday, 20 February 2008

OpenOffice 2.4.0 provides PDF/A export

It's the littlest things that get me excited these days...

Exporting PDF/A for long-term archiving [from OpenOffice 2.4]

You can test it out using the newest release candidate here: Unstable Developer's snapshot for OO 2.4.0

Well, that sorts that out ;)

Amazon's SimpleDB supports GET with an Action param... that does stuff?

After the CRIG RESTful meeting the other day, REST has been at the forefront of my mind, particularly the discussions about idempotent requests. With this in mind, the following post grabbed my attention:

This article sums up a few things, and whilst I don't agree with the guy's reasoning, it's an interesting piece nonetheless.

Monday, 11 February 2008

Deposit by email

(This is a 'if I get time' implementation outline, based on my experiences with other online repositories such as YouTube and Flickr)
  1. User registers to let them deposit material by email.
    • This must be done to set up default licensing of the submitted material, and also to get their agreement to the standard deposit agreement. The submitter needs to be aware that if they choose this mechanism for a deposit, that it is, by its very nature, less rigorous as there is little opportunity for authentication.
  2. They are given an email template and a unique email address to post to:
    • The email template
      • should be a text/plain file, that can be opened and edited. The format should be forgiving (not XML then) and should be suitable for simple dublin core. I was thinking of a simple (keyword) : ([.*])\n type format, discarding anything which doesn't match.
      • Pass by reference should be acceptable for the submission of items.
      • For Pass by Value, the files should be attached to the email.
    • The email address target
      • for that individual will be of the form (random16letteralphanum) Email sent to this within a pre-determined timeframe will be anticipated to be from that person. E.g.
      • To help curb the inevitable problems faced by people who seek to type this in (rather than just click or copy and paste), avoid the letters I, O, L and the numbers 1 and 0. This should still be enough for a decent amount of randomised combinations.
      • A step to harden this process might be to lock down submissions based on the email they are sent from in addition to given them a unique target. But I have a feeling that this may not be necessary.
  3. An email is sent in reply either:
    • giving them their persitent, citable URL to this item (not plural)
      • [Thoughts: How about passing an ORE map to the system...? Might have some milage]
    • Stating why their submission didn't make it (likely a metadata issue.)

Saturday, 2 February 2008

Yahoo Pipes + Solr's API = RSS feeds for repository submitters

Quick post: Part of the reason for trying to get as many open APIs onto the services I put in place for the repository is so that I don't have to customise things for every department or use; the interested parties can do it for themselves.

As a little proof, I have set up a Pipe that takes repository search results, based on an author's name, and creates an RSS feed from it:
You should be able to clone this pipe, play with it, or simply use it for it's intended purpose.