Our thoughts in real time
Amazon EC2
November 29, 2011
How to set up and exploit an Apache Solr environment on Amazon EC2
What’s Solr?
Solr is an open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.
What’s EC2?
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers. An Amazon Machine Image (AMI) is a special type of pre-configured operating system and virtual application software which is used to create a virtual machine within the Amazon Elastic Compute Cloud (EC2). It serves as the basic unit of deployment for services delivered using EC2.
Requirements
This tutorial uses:
- An AMI consisting of a 32bit base Fedora 8 install (most of the Linux based AMI will work fine)
- Java version: 1.6.0_29
- Solr version: 3.4.0
Let’s now look at the actual installation steps:
0) Open 8983 port
Before starting it’s necessary to open port 8983, which is the port that Solr listens on by default.
This can be done by adding a rule to the security group which the chosen AMI belongs to.
1) Install java
To check if java is already installed in the machine, enter the command:
java -version
if the response is:
-bash: java: command not found
java is not installed in the machine.
To download and install java, enter the following commands
wget http://download.oracle.com/otn-pub/java/jdk/6u29-b11/jdk-6u29-linux-i586-rpm.bin
chmod +x jdk-6u29-linux-i586-rpm.bin
2) Install Solr
To get Solr, enter:
wget http://mirror.nyi.net/apache//lucene/solr/3.4.0/apache-solr-3.4.0.tgz
tar xzf apache-solr-3.4.0.tgz
3) Start Solr
Solr comes with its own servlet container, Jetty, bundled withe package above.
To start Solr, go to the example directory:
cd apache-solr-3.4.0/example/
and enter:
java -jar start.jar
To verify that the server is running correctly, open the web browser and enter the following URL:
http://ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com:8983/solr/admin/
(where xxx-xxx-xxx-xxx stands for the public DNS address)
4) Start Indexing
Now that the Solr server is running, it is possible to start building a simple index.
To create a new index go to the exampledocs folder and enter:
java -jar post.jar *.xml
This command will index all the .xml documents contained in the exampledocs folder.
Solr allows you to import data in many different ways and formats. It is possible to index CSV files, JSON documents, .pdf and .doc documents through Solr Cell (http://wiki.apache.org/solr/ExtractingRequestHandler) and it is possible to get records directly from the database through the Data Import Handler (http://wiki.apache.org/solr/DataImportHandler).
One way to search over the just indexed files is to go to the admin page (http://ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com:8983/solr/admin/) and query the index through with the Solr query language (http://wiki.apache.org/solr/SolrQuerySyntax).
This language is an extension of the Lucene query language and allows you to exploit the powerful searching features of Solr by simply adding some parameters to the query.
A more interactive way to search over the created index is to open the browser and go to the following URL:
http://ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com:8983/solr/browse
An example of search interface will let you search over the documents and show some of the most relevant features of Solr (autocomplete, faceting, highlighting…)
Several clients can be used to distribute Solr client software depending on the developer preferences. A list of some popular clients follows:
These simple steps are all you need to run a Solr instance on Amazon EC2.
Now that the project is set up you can take advantage of your Solr-AMI to perform all kinds of smart searches for your own projects and websites.
Don’t forget to visit and explore the Solr Wiki Page for more advanced uses and customizations of Solr.
Alberto Montagnese
Posted at 10:39 PM in Amazon EC2, Java, Web/Tech | Permalink | Comments (0) | TrackBack (0)
March 28, 2009
Launching EC2 On Demand: Video Transcoding
At Grio, we use EC2 to power almost all of our server needs. Amazon hosting provides a convenient means of housing a web server and database server, a wiki, and our client development environment. It's a cost efficient solution for companies like ours, in that we can avoid the hassle or purchasing and maintaining hardware. The strategy allows us to add servers only when we need them and remove them when they are no longer needed. Since Amazon's pricing structure is based on the duration of server's up-time, we want to make sure that we only use a server when necessary.
If you're interested in saving money (who isn't?) your EC2 instances should have a limited idle time. In this blog entry, I'll discuss how one can create an EC2 instance (which is equivalent to a "server"), use it to process computing-intensive tasks, and terminate it when the process is completed. We assume some familiarity with the EC2 API.
As a real-world example, we'll build a server that will transcode high-definition movies ill-suited for web delivery to smaller flash-based FLV files that are more appropriate for that purpose. Once the transcoding process completes, the server will terminate. Transcoding is a very cpu-intensive process. Thus, co-locating your video file manipulation with your your web server may not be such a good idea. At the same time, (unless you are You Tube and need real-time transcoding) having a dedicated video transcoding EC2 instance running 24/7 may be redundant and will incur high hosting cost (at the time of this writing, it costs about $70/month for small instance and $550/month for extra large one).
Before we start, let's pick an AMI (Amazon Machine Image) that will be used as the basis of our work. We'll use Amazon's Fedora 8 public AMI (ami-id=ami-f51aff9c). Then, we'll need to perform the following tasks:
- Configure the instance to run custom script on instance launch.
- Bundle and register the configured instance.
- Create script to install required software for video transcoding.
- Write script to download videos, transcode them and send the results to a destination such as S3.
- Launch the instance by passing it with the scripts.
ec2-run-instances ami-f51aff9c
wget http://169.254.169.254/1.0/user-data -O /tmp/autorun
sh /tmp/autorun
The wget is used to pull down the data uploaded by ec2-run-instance when it's called with -f argument. So, if the following command is executed locally...
ec2-run-instances ami-xxxxxxxx -f transcoding-bootstrap.sh
Client.InvalidParameterValue: User data is limited to 16384 bytes
ec2-bundle-vol -d /mnt/ --cert [your_certificate] --private [your_privatekey] --user [your_user_access_id]
ec2-upload-bundle --access-key [your_access_key] --secret-key [your_secret_key] --bucket [your_s3_bucket] --manifest image.manifest.xml
ec2-register [your_s3_bucket]/image.manifest.xml
Installing Transcoding Software
#
# install ffmpeg
#
yum -y install ncurses-devel gcc gcc-c++ libtool svn git yasm gsm-devel libogg-devel libvorbis-devel libtheora-develsvn export svn://svn.mplayerhq.hu/ffmpeg/trunk /mnt/ffmpeg-trunk-source
cd /mnt/ffmpeg-trunk-source
git clone git://git.videolan.org/x264.git
cd x264
./configure --prefix=/usr --enable-shared --enable-pthread --disable-asm
make
make installcd ..
wget http://liba52.sourceforge.net/files/a52dec-0.7.4.tar.gz
tar -zxvf a52dec-0.7.4.tar.gz
cd a52dec-0.7.4
./configure --prefix=/usr --enable-double
make
make install
cd ..wget http://downloads.sourceforge.net/faac/faac-1.26.tar.gz
tar -zxvf faac-1.26.tar.gz
cd faac
autoreconf -vif
./configure --prefix=/usr
make
make install
cd ..wget http://downloads.sourceforge.net/faac/faad2-2.6.1.tar.gz
tar -zxvf faad2-2.6.1.tar.gz
cd faad2
autoreconf -vif
./configure --prefix=/usr
make
make install
cd ..wget http://downloads.sourceforge.net/lame/lame-3.98b8.tar.gz
tar -zxvf lame-3.98b8.tar.gz
cd lame-3.98b8
./configure --prefix=/usr
make
make install
cd ..wget http://libmpeg2.sourceforge.net/files/mpeg2dec-0.4.1.tar.gz
tar -zxvf mpeg2dec-0.4.1.tar.gz
cd mpeg2dec-0.4.1
./configure --prefix=/usr
make
make install
cd ..wget http://downloads.xvid.org/downloads/xvidcore-1.1.3.tar.gz
tar -zxvf xvidcore-1.1.3.tar.gz
cd xvidcore-1.1.3/build/generic
./configure --prefix=/usr
make
make install
cd ../../../wget http://ftp.penguin.cz/pub/users/utx/amr/amrnb-7.0.0.1.tar.bz2
tar -jxvf amrnb-7.0.0.1.tar.bz2
cd amrnb-7.0.0.1
./configure --prefix=/usr
make
make install
cd .../configure --prefix=/usr --enable-static --enable-shared --enable-gpl --enable-nonfree --enable-postproc --enable-avfilter --enable-avfilter-lavf --enable-libamr-nb --enable-libfaac --enable-libfaad --enable-libfaadbin --enable-libmp3lame --enable-libvorbis --enable-libx264 --enable-libxvid
make
make install
while more videos are available {
download [video_file]
ffmpeg -i [video_file] -f flv -b 350kbps [output_file]
push [out_file] to destination
}
ec2-run-instances ami-a123beef -f /path/to/transcode.sh
Posted at 08:44 PM in Amazon EC2 | Permalink | Comments (0) | TrackBack (0)