What is MyLan?

For providing file and data search within a LAN I developed a search engine named MyLan. MyLan was my first attempt in distributed computing. MyLan provides the facility to search for files and data within the files in a local area network. at Hughes Software Systems (HSS), where I was working before joining for MS at UMR, a set of 14 networked computers provide this distributed search capability. When I left HSS MyLan was providing the search faility on a total of 5000 connected computers with an overall search file size of over 2 GB. A master-slave architecture was employed where some machines acts as the master cordinating the search task while the slave machines searches for the contents and pass on to the master who is then responsible for formatting the data and presenting it to the suer.  

MyLan conists of seven modules.
 

        • User Interaction Module
        • Load Balancer
        • File Server
        • Search Manager
        • Search Worker
        • Download Manager
        • Net Spider
See the FlowChart
There is a power point presentation on this. I am thankful to Imran Ibrahim for preparing this ppt.

User Interaction Module
        User Interaction module consists of a servlet code which accepts the search criterion entered by the user. The servlet forwards the search data to   the load balancer module.

Load Balancer
        The Load balancer as the name implies is responsible for redirecting the search request to multiple search managers which may be on the same machine or on different machines. Load balancer identifies a search manager which is idle and forwards the search request to that manager.

Search Manager
        The search manager is responsible for the management of a pool of search workers. Search workers are the work horse of MyLan, responsible for searching the criterion entered by the user.

Search Worker
        Search worker as earlier said is the work horse for MyLan. It memory maps the search file assigned to it. Since memory mapping is used, no. of search workers per machine is restricted to one for optimum performance. This restriction is because if multiple search workers exists in a machine, memory mapping the same file, will result in un necessary page faults during context switching. This will result in slow search operation.

File Server
        File server is responsible for collecting the shared file info periodically(daily at 6:30 PM) from the LAN. Once the information is collected, the File Server informs the Load balancer about this which inturn informs all the Search managers to re register all the search workers connected to them. Registration essentially means re assigning the search data among the workers.

Download Manager
        Download Manager is responsible for downloading the requested files when user clicks on a link. You might think that won't this facility, by default be provided by the web server? Answer is No!, because the file exists in shared drive in a remote machine. To connect to it, proper authentication is needed in the form of username and password which the web server wont be able to supply. So I made a server program listening on port 1024. When ever the user clicks on a link for down loading the file. The request is directed to the server at port 1024 in my machine. The link is URL encoded so from the link the username, password, machine name are extracted and the specified file is downloaded.

Net Spider
       This module is responsible for crawling through the network identifying machines and connecting to them using the default username "guest" and password "guest". Each machine is enumerated for their shared directories. Then each shared drive is recursively searched for file names and the information collected is stored in a single file. This is the central reporsitory of information. which will be divided among the distributed search workers spread over different machines. The primary requirement for a file to appear in the search engine is that your directory should be shared with the username "guest" and password "guest". So you always have control on files that you want them to be visible in MyLan.
 
 

Except for the servlet part the entire program is written in C++ for windows. I was using Apache Webserver with tomcat 3.2.1. I had 2 search managers running on my machine each having around 6 search workers. That is a total of 14 machines are involved in this search operation. The size of the central search reporsitory is around 2 GB which is just the filenames that are shared in our LAN. This means we have enormous amount of data shared in LAN.