|
For providing file and data search within a LAN I developed a search engine named MyLan. MyLan was my first attempt in distributed computing. MyLan provides the facility to search for files and data within the files in a local area network. at Hughes Software Systems (HSS), where I was working before joining for MS at UMR, a set of 14 networked computers provide this distributed search capability. When I left HSS MyLan was providing the search faility on a total of 5000 connected computers with an overall search file size of over 2 GB. A master-slave architecture was employed where some machines acts as the master cordinating the search task while the slave machines searches for the contents and pass on to the master who is then responsible for formatting the data and presenting it to the suer.
MyLan conists of seven
modules.
|
There is a power point presentation on this. I am thankful to Imran Ibrahim for preparing this ppt.See the FlowChart
User Interaction
Module
User Interaction module consists of a servlet code which accepts the search
criterion entered by the user. The servlet forwards the search data to
the load balancer module.
Load Balancer
The Load balancer as the name implies is responsible for redirecting the
search request to multiple search managers which may be on the same machine
or on different machines. Load balancer identifies a search manager which
is idle and forwards the search request to that manager.
Search Manager
The search manager is responsible for the management of a pool of search
workers. Search workers are the work horse of MyLan, responsible for searching
the criterion entered by the user.
Search Worker
Search worker as earlier said is the work horse for MyLan. It memory maps
the search file assigned to it. Since memory mapping is used, no. of search
workers per machine is restricted to one for optimum performance. This
restriction is because if multiple search workers exists in a machine,
memory mapping the same file, will result in un necessary page faults during
context switching. This will result in slow search operation.
File Server
File server is responsible for collecting the shared file info periodically(daily
at 6:30 PM) from the LAN. Once the information is collected, the File Server
informs the Load balancer about this which inturn informs all the Search
managers to re register all the search workers connected to them. Registration
essentially means re assigning the search data among the workers.
Download
Manager
Download Manager is responsible for downloading the requested files when
user clicks on a link. You might think that won't this facility, by default
be provided by the web server? Answer is No!, because the file exists in
shared drive in a remote machine. To connect to it, proper authentication
is needed in the form of username and password which the web server wont
be able to supply. So I made a server program listening on port 1024. When
ever the user clicks on a link for down loading the file. The request is
directed to the server at port 1024 in my machine. The link is URL encoded
so from the link the username, password, machine name are extracted and
the specified file is downloaded.
Net Spider
This module is responsible for crawling through the network identifying
machines and connecting to them using the default username "guest" and
password "guest". Each machine is enumerated for their shared directories.
Then each shared drive is recursively searched for file names and the information
collected is stored in a single file. This is the central reporsitory of
information. which will be divided among the distributed search workers
spread over different machines. The primary requirement for a file to appear
in the search engine is that your directory should be shared with the username
"guest" and password "guest". So you always have control on files that
you want them to be visible in MyLan.
Except for the servlet part the entire program is written in C++ for windows. I was using Apache Webserver with tomcat 3.2.1. I had 2 search managers running on my machine each having around 6 search workers. That is a total of 14 machines are involved in this search operation. The size of the central search reporsitory is around 2 GB which is just the filenames that are shared in our LAN. This means we have enormous amount of data shared in LAN.