C# threading- returning values

Soldato
Joined
16 Nov 2003
Posts
9,682
Location
On the pale blue dot
Hi guys.

I need to read 400+ files from the Internet which of course takes time, most of which is latency requesting the file as they are very small files. Therefore I want to dabble in threading to say do 10 simultaneous requests at once.

The difficult part is that I read the files into a hashtable which is processed further down the line, but threads do not return values so I can't execute 10 threads that read into the same hashtable, not to mention 10 threads trying to access the same variable at the same time is likely to cause some very nasty problems.

I'm a bit stumped on how to proceed. Has anyone done anything similar in the past, how can I collate the data the separate threads generate into one source?
 
You synchronize your threads so that you have atomic access to the hashtable through use of condition variables/locking etc...

What you could do is read the file into your typical memory buffer, wait until the data structure is available then write to it. You do this by blocking the thread, then signal it to wake it up and write.

Actually im not sure what your getting at here, you might not even need that synchronization in this case. Its only needed when you share data and have multiple threads acting on the same data.
 
Last edited:
Thanks for the info so far guys. Here is my program in a nutshell:

Read into a hashtable a list of URLs from SQL Server
Add to another hastable the source code of each of these URLs
Do some processing on each of these source strings.

The processing takes seconds, it is the fetching of the data from the Internet that takes a while, most of the time is idle time waiting for the remote server to respond to the request. Therefore I was considering using threads to that I could do a few simultaneous web requests at once to get the data back faster.

So imagine that each thread calls a function that gets a single web page and returns its source. I want to get all of these sources into one hashtable so that I can work on it after the threads have completed, but as I understand it threads must always return void.

So thanks for the advice, I'm experimenting locking the variable I have bought in by ref and also investigating delegates, as that's a new concept to me too!
 
I don't do C# but basically you can use something similar to this principle :). Hopefully the comments demonstrate it fine you don't actually need locking unless you got multiple writers/readers going on. Thread new/join etc.. should be similar because they are in java and c# is basically a copy :p Delegates would work as well indeed. You could do the processing of your data as soon as it is received but that makes it slightly more complicated and might not be possible depending on what data you need for the processing.

Code:
# Example to demonstrate fetching multiple files at once by una. 
require 'open-uri'

# URL maps to FileData
hash_table = Hash.new
threads = [] 

["http://www.google.co.uk","http://www.darkfibre2.net","http://news.bbc.co.uk"].each { |file| 
        threads << Thread.new { # Create a seporate thread for each file to fetch. 
            hash_table[file] = open(file).read # Fetch the file and enter into hash table. 
        }
}

threads.each { |thread| thread.join } # Join all the threads so they have completed before main
                                      # returns 

puts hash_table # Print out the whole hash table - do your processing whatever - all data is present at this point.

Edit: mm looking at it, that may confuse you since it looks nothing like C# haha, oops. I'm sure someone else will be able to give you a c# example.
 
Last edited:
Seriously, all you need to do is use a delegate to callback to the main thread, then lock the hash table as you write to it. No need for any of this messing about :)
 
Righty ho here is a cut down version of what I have working so far:

Code:
        static string XblGetPage(String url, CookieContainer cookieJar)
        {
                //Code to get the page and return it as a string
        }

        delegate string XblGetPageDelegate(String url, CookieContainer cookieJar);

        static Hashtable XblGetSource(Hashtable gamertags, CookieContainer cookieJar)
        {
            Hashtable source = new Hashtable();

            XblGetPageDelegate foo = new XblGetPageDelegate(XblGetPage);
            Hashtable threads = new Hashtable();
            foreach (Int16 i in gamertags.Keys)
            {
                threads.Add(i, foo.BeginInvoke("http://live.xbox.com/en-GB/profile/Achievements/ViewAchievementSummary.aspx?compareTo=" + gamertags[i], cookieJar, null, null));
            }

            foreach (Int16 i in threads.Keys)
            {
                source.Add(i, foo.EndInvoke(threads[i] as IAsyncResult));
            }

            // etc...

Apart from error handling the only glaring problem is on the live system that would invoke 400+ threads (unless .Net has a thread cap?), so I will re-jig the code to create say 5 threads at once, add them to the source hashtable, then do the next five etc...

Thanks for the help guys, do critique the above in case I'm going something naughty.

Edit: I'm still learning about delegation, I think it would also be useful to check to see if the thread has finished, not just block on EndInvoke, or I may hold up processing. Lots to read up on :D
 
Last edited:
This is probably wrong as I don't know much about writing multithreaded code, but wouldn't using the ThreadPool be easy?
Like this: www.vai-net.co.uk/threadpool.cs

By default it is limited to 25 active threads at a time, but it can be changed with ThreadPool.SetMaxThreads.
 
What do you mean?
Right, run this, see the difference (forgive the lack of comments + variable naming naffness :p)
Code:
    public class WorkerThread
    {
        public delegate void ThreadDelegate(string str);

        Random rnd;
        ThreadDelegate _returnDataMethod;
        string _str = "";

        public WorkerThread(int threadNum, ThreadDelegate returnMethod)
        {
            rnd = new Random(threadNum);

            _str = "Thread " + threadNum.ToString();
            _returnDataMethod = returnMethod;
        }

        public void WorkerMethod()
        {
            //Thread.Sleep(rnd.Next(200, 1000));
            Thread.Sleep(1000);
            _returnDataMethod(_str);
        }
    }

    public class Program
    {
        int numTests = 100;

        static void Main(string[] args)
        {
            Program program = new Program();
            DateTime start;
            double asyncTime, threadTime;

            Console.WriteLine("Starting Tests...");

            start = DateTime.Now;
            program.TestAsync();
            asyncTime = (double)DateTime.Now.Subtract(start).Ticks;
            Console.WriteLine(String.Format("Async Test Completed in {0} ticks", asyncTime));

            start = DateTime.Now;
            program.TestThread();
            threadTime = (double)DateTime.Now.Subtract(start).Ticks;
            Console.WriteLine(String.Format("Threaded Test Completed in {0} ticks", threadTime));

            Console.WriteLine(String.Format("Threaded Test completed in {0}% the time of Async Test", Math.Round((threadTime / asyncTime) * 100)));

            Console.ReadLine();
        }

        /*********** Async delegate ***********/
       
        List<string> lstAsyncReturns = new List<string>();

        private string AsyncMethod(int callNum)
        {
            Random rnd = new Random(callNum);

            //Thread.Sleep(rnd.Next(200, 1000));
            Thread.Sleep(1000);
            return String.Format("Call {0}", callNum);
        }

        delegate string AsyncDelegate(int callNum);

        public void TestAsync()
        {
            AsyncDelegate asyncTest = new AsyncDelegate(AsyncMethod);
            List<IAsyncResult> asyncResults = new List<IAsyncResult>();

            for (int cNum = 1; cNum <= numTests; cNum++)
            {
                asyncResults.Add(asyncTest.BeginInvoke(cNum, null, null));
            }

            foreach (IAsyncResult asyncResult in asyncResults)
            {
                lstAsyncReturns.Add(asyncTest.EndInvoke(asyncResult));
            }

            foreach (string item in lstAsyncReturns)
            {
                //Console.WriteLine(item);
            }
        }


        /*********** Threaded with callback delegate ***********/

        int activeThreads = 0;
        List<string> lstReturns = new List<string>();

        public void ThreadDelegateMethod(string str)
        {
            lock (lstReturns)
            {
                activeThreads--;
                lstReturns.Add(str);
                Monitor.Pulse(lstReturns);
            }
        }

        public void TestThread()
        {
            for (int tNum = 1; tNum <= numTests; tNum++)
            {
                activeThreads++;

                WorkerThread newWorker = new WorkerThread(tNum, new WorkerThread.ThreadDelegate(ThreadDelegateMethod));
                Thread thread = new Thread(newWorker.WorkerMethod);
                thread.Start();
            }

            lock (lstReturns)
            {
                while (activeThreads > 0) Monitor.Wait(lstReturns);
            }

            foreach (string item in lstReturns)
            {
                //Console.WriteLine(item);
            }
        }
    }
 
Last edited:
Thanks again guys, I haven't touched threading since operating system theory back at uni and that was... well all theory on pieces of paper :o

Thanks for the code, I'm going to step through them and get a better understanding of this all.
 
This is probably wrong as I don't know much about writing multithreaded code, but wouldn't using the ThreadPool be easy?
Like this: www.vai-net.co.uk/threadpool.cs

By default it is limited to 25 active threads at a time, but it can be changed with ThreadPool.SetMaxThreads.

25 threads is plenty and probably really too much.

The ideal is to have as many threads as there are processors in the system. Anything over this, technically, reduces performance because of context switching. But the side affect of that is increased concurrency. Depends which trade off your application benefits from.

ThreadPool has an internal queue so even if all the threads are busy doing other jobs then it just gets added to a queue and gets processed ASAP.

ThreadPool is not as elegant as delegates and a bit more work is needed if you want to "return values" from it. Personally for a simple "fire n forget" I would use ThreadPool as it is lighter weight. But for retaining structure and easily getting returned values I would use delegates, i.e. BeginInvoke/EndInvoke.
 
The ideal is to have as many threads as there are processors in the system.
That ideal only works if your threads are going to be using 100% CPU, in this and many (most?) other instances, where you're waiting for some external event to occur before continuing work, you ideally want many more threads all running and waiting concurrently.
 
Yup loads of variables that affect it. The OP is doing some I/O (i.e. requesting a file from the internet) which is definately going to block the thread for a duration of time.

The "ideal" I mentioned is just that, an ideal. It's the theoretical best. Sometimes there are things you can do to get very close (or exactly) on it, other times and probably most often there isn't. Out of scope of thread TBH :)
 
Back
Top Bottom